Network architecture

ABSTRACT

The present invention provides a network architecture. An embodiment includes a plurality of nodes interconnected by links. Each node can maintain knowledge of other nodes in a database. The database contains a list of other nodes in the network, and a ‘next-best-step’ for each of those other nodes, pointing to a neighbouring node that is the next best step to that other node. Where a particular node of the network is not in the list, then the next-best-step is assumed to be the next-best-step most commonly identified in the database. Such a network will form a “core” wherein any node in the network can find any other node in the network by first seeking out that other node at the core. Once the nodes locate each other via the core, a more optimum route forms in the network according to the most desirable path between those nodes.

PRIORITY CLAIMS

The present application is a continuation of U.S. patent applicationSer. No. 10/598,020, filed May 4, 2007, which claims priority fromCanadian Patent Application No. 2,457,909, filed Feb. 16, 2004, U.S.Provisional Patent Application No. 60/544,341, filed Feb. 17, 2004,Canadian Patent Application No. 2,464,274, filed Apr. 20, 2004, CanadianPatent Application No. 2,467,063, filed May 17, 2004, Canadian PatentApplication No. 2,471,929, filed Jun. 22, 2004, Canadian PatentApplication No. 2,476,928, filed Aug. 16, 2004 and Canadian PatentApplicant No. 2,479,485, filed Sep. 20, 2004, the contents of all ofwhich are incorporated herein by reference.

FIELD OF THE INVENTION

The present invention relates generally to electronic, telecommunicationand computing devices that communicate with each other and moreparticularly to a network architecture therefor.

BACKGROUND OF THE INVENTION

Networked devices are now an extremely important aspect of our socialfabric. The public switched telephone network (“PSTN”) is perhaps thefirst example of a ubiquitous network of telecommunication devices thatchanged the way people interact. Now, mobile telephone networks, theInternet, local area networks (“LAN”), wide area networks (“WAN”), voiceover internet protocol (“VOIP”) networks, are widely deployed andgrowing.

It is trite to say that each of these devices need to be able to reacheach other in order to fulfill networking functions. With the PSTN, asystem of telephone numbers is employed, including country codes, areacodes, local exchanges, etc. At least in North America, the explosion oftelephonic devices has stretched the standard ten digit number scheme.With the Internet, the Internet Protocol Version 4 (“IPV4”) promulgatesa system of Internet Protocol (“IP”) addresses to identify points on theInternet, and thus each networked device has an address making itreachable on the Internet. Due at least in part to the limited length ofthe IPV4 address field, IP addresses can bear little geographicrelationship to their physical location. As a result, routers androuting tables throughout the Internet are extremely bloated, increasingcomplexity in traffic routing and increasing network latency. IPV6offers potential relief addresses, but the upgrade to IPV6 is expectedto be slow.

In very general terms, many prior art network architectures rely onrouting devices to maintain addresses and locations of the devicesthroughout the network. Such routing devices are essentially trafficcops, routing traffic along appropriate pathways. Such architecturesbecome clumsy and awkward as the networks grow.

Various “router-less” network architectures have been proposed. Some ofthese architectures are referred to as peer-to-peer networks, whileothers are referred to as ad-hoc networks. Regardless, these prior artarchitectures also tend to suffer from scaling and/or other limitations.One attempt to improve network architectures is Ad Hoc On DemandDistance Vector (“AODV”). AODV is a reactive protocol that uses abroadcast flood in order to establish a new connection or fix a brokenconnection. AODV is described in detail in the Internet Engineering TaskForce (“IETF”) document found at http://www.ietf.org/rfc/rfc3561.txt.While AODV has the advantage of being able to easily organize nodes intoan ad-hoc network one of the problems it has is that the maximum networksize is extremely limited.

Another attempt to improve network architectures is ‘DestinationSequenced Distance Vector’ (“DSDV”). DSDV is a proactive protocol thatuses a constant flood of updates to create and maintain routes to andfrom all nodes in the network. A detailed description of DSDV is foundathttp://citeseer.ist.psu.edu/cache/papers/cs/2258/http:zSzzSzwww.srvloc.orgzSzcharliepzSztxtzSzsigcomm94zSzpaper.pdf/perkins94highly.pdfor http://citeseer.ist.psu.edu/perkins94highly.html. While DSDV has theadvantage of providing loop free routing it has the disadvantage of aonly working in small networks. In large networks the control trafficeasily exceeds the available bandwidth.

Another attempt to improve network architectures is ‘Optimized LinkState Routing’ (“OLSR”). OLSR is a proactive protocol that attempts tobuild knowledge of the network topology. A detailed description of OLSRcan be found in this IETF drafthttp://hipercom.inria.fr/olsr/draft-ietf-manet-olsr-11.txt. While OLSRhas the advantage of being a more efficient link state protocol it isstill unable to support larger networks.

Another attempt to improve network architectures is ‘Open Shortest PathFirst’ (“OSPF”). OSPF is a proactive link state protocol that is used bysome internet core routers. A detailed description of OSPF can be foundin this IETF draft http://www.ietf.org/rfc/rfc1247.txt. While OSPFallows core internet routers to route around failure is has limitationson the size of networks it is able to support.

Despite the differences between AODV, DSDV, OLSR and OSPF they all sharesome, of the same problems—e.g. the difficulty of scaling past a fewhundred nodes. This limitation occurs because as the network grows, theamount of control traffic required grows much faster. Rapidly, theamount of control traffic needed will exceed the capacity of the network

In general, prior art network architectures do not provide the goodscalability, nor do they provide the ability to allow low capacitydevices to fully interact with the larger network, and in mobileenvironments, prior art architectures do not always provide seamlessmobility.

SUMMARY OF THE INVENTION

It is an object of the present invention to provide a novel system andmethod for networking that obviates or mitigates at least one of theabove-identified disadvantages of the prior art.

A first aspect of the invention provides a network that comprises aplurality of nodes and a plurality of links interconnecting neighbouringones of the nodes. Each of the nodes are operable to maintaininformation about each of the nodes that are within first portion of thenodes. The information includes: a first identity of another one of thenodes within the first portion; and for each first identity, a secondidentity representing a neighbouring node that is a desired step toreach the another one of the nodes respective to the first identity.Each of the nodes are operable to determine a neighbouring node that isa desired step to locate the nodes in a second portion of the nodes thatare not included in the first portion.

In a particular implementation of the first aspect, the determination isbased on which of the neighbouring nodes most frequently appears in eachsecond identity.

In a particular implementation of the first aspect, each of the nodes isoperable to exchange the information with its neighbouring nodes.

In a particular implementation of the first aspect, each link has a setof service characteristics such that any path between two of the nodeshas a cumulative set of service characteristics; and wherein the desiredstep is based on which of the paths has a desired cumulative set ofservice characteristics.

In a particular implementation of the first aspect, the servicecharacteristics include at least one of bandwidth, latency and bit errorrate.

In a particular implementation of the first aspect, the nodes are atleast one of computers, telephones, sensors, personal digitalassistants.

In a particular implementation of the first aspect, the links are basedon at least one of wired and wireless connections.

In a particular implementation of the first aspect, a network core isformed between neighbouring nodes that determine each other's desiredstep to reach the nodes within the second portion.

In a particular implementation of the first aspect, each node isoperable to instruct other nodes between the core and the node tomaintain information about the node.

In a particular implementation of the first aspect, each node isoperable to request information about the nodes within the secondportion; each node being operable to make the request to the other nodesbetween the core and the node.

One advantage of the present invention over the prior art is that thenetwork architecture taught herein allows for large scaleself-organizing networks. This feature is enabled, for certainembodiments, because very few nodes in the network need actually haveknowledge of the entire network. Collectively, all nodes in the networkhave knowledge of the entire network, and nodes that are unaware ofother nodes, but which need find such other nodes, are provided withmeans of locating those other nodes by seeking such knowledge from othernodes in the network having relevant knowledge. For these and otherreasons, the present invention is a novel self-organizing networkarchitecture that enables for substantially larger self-organizingnetworks than prior art self-organizing network architecture. Thus, asecond aspect of the invention provides a self-organizing networkcomprising at least 2,000 nodes interconnected by a plurality of links.A third aspect of the invention provides a self-organizing networkcomprising at least 5,000 nodes interconnected by a plurality of links.A fourth aspect of the invention provides a self-organizing networkcomprising at least 10,000 nodes interconnected by a plurality of links.A fifth aspect of the invention provides a self-organizing networkcomprising at least 100,000 nodes interconnected by a plurality oflinks.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will now be described by way of example only, and withreference to the accompanying drawings, in which:

FIG. 1 is a schematic representation of a network in accordance with anembodiment of the invention;

FIG. 2 shows a flow-chart depicting a method of spreading networkknowledge in accordance with an embodiment of the invention;

FIG. 3 is a schematic representation of a network depicting aperformance of a step of the method of FIG. 2, in accordance with anembodiment of the invention;

FIG. 4 is a schematic representation of a network depicting aperformance of a step of the method of FIG. 2, in accordance with anembodiment of the invention;

FIG. 5 is a schematic representation of a network depicting aperformance of a step of the method of FIG. 2, in accordance with anembodiment of the invention;

FIG. 6 is a schematic representation of a network depicting aperformance of a step of the method of FIG. 2, in accordance with anembodiment of the invention;

FIG. 7 is a schematic representation of a network depicting aperformance of a step of the method of FIG. 2, in accordance with anembodiment of the invention;

FIG. 8 is a schematic representation of a network depicting aperformance of a step of the method of FIG. 2, in accordance with anembodiment of the invention;

FIG. 9 is a schematic representation of a network depicting aperformance of a step of the method of FIG. 2, in accordance with anembodiment of the invention;

FIG. 10 is a schematic representation of a network in accordance withanother embodiment of the invention;

FIG. 11 is a schematic representation of a network in accordance withanother embodiment of the invention;

FIG. 12 is another schematic representation of the network of FIG. 11;

FIG. 13 is a schematic representation of a network in accordance withanother embodiment of the invention;

FIG. 14 is a schematic representation of a network in accordance withanother embodiment of the invention;

FIG. 15 is another schematic representation of the network of FIG. 14;

FIG. 16 is another schematic representation of the network of FIG. 14;

FIG. 17 is a schematic representation of a network in accordance withanother embodiment of the invention;

FIG. 18 shows a flow-chart depicting a method of obtaining networkknowledge in accordance with another embodiment of the invention;

FIG. 19 is a schematic representation of a network in accordance withanother embodiment of the invention;

FIG. 20 shows a flow-chart depicting a method of exchanging informationto establish a connection between nodes in accordance with anotherembodiment of the invention;

FIG. 21 shows a flow-chart depicting an initialization process for amethod of establishing a connection between nodes in accordance withanother embodiment of the invention;

FIG. 22 is a schematic representation of a network showing the additiveproperty of cumulative link cost for a method of spreading nodeknowledge in accordance with another embodiment of the invention;

FIG. 23 shows a flow-chart depicting the flow of node knowledge througha network for a method of spreading node knowledge in accordance with anembodiment of the invention;

FIG. 24 shows a flow-chart depicting the flow of node knowledge througha network for a method of spreading node knowledge in accordance with anembodiment of the invention;

FIG. 25 shows a flow-chart depicting the flow of node knowledge througha network for a method of spreading node knowledge in accordance with anembodiment of the invention;

FIG. 26 shows a flow-chart depicting the flow of node knowledge througha network for a method of spreading node knowledge in accordance with anembodiment of the invention;

FIG. 27 is a schematic representation of a network showing a method fordetecting an isolated core in accordance with an embodiment of theinvention;

FIG. 28 shows a flow-chart depicting a method for routing through anetwork using TCP/IP as an example of a protocol that can be emulated,in accordance with an embodiment of the invention;

FIG. 29 is a schematic representation of a network showing node Adirectly connected to nodes B and C; node C only connected to node A;and node B directly connected to four nodes;

FIG. 30 shows a flow-chart depicting how service time on a queue can becalculated in accordance with an embodiment of the invention;

FIG. 31 is a schematic representation of a network showing anarrangement of nodes and queues in accordance with an embodiment of theinvention;

FIG. 32 shows a number of flow-charts depicting a series of stepsshowing knowledge of a queue propagating a network in accordance with anembodiment of the invention;

FIG. 33 is a schematic representation of a network showing every node inthe network having just become aware of the EUS created queue, inaccordance with an embodiment of the invention;

FIG. 34 is a schematic representation of the network of FIG. 33 with oneof the connections between the node with the EUS created queue removed;

FIG. 35 is a schematic representation of the network of FIG. 33 with thedirectly connected node that lost its connection to the node with theEUS created queue set to a latency of infinity;

FIG. 36 is a schematic representation of the network of FIG. 33 with allthe node's ‘chosen destinations’ at infinity;

FIG. 37 is a schematic representation of the network of FIG. 33 with allnodes that can be set to infinity being set to infinity;

FIG. 38 is a schematic representation of the network of FIG. 33 withevery node that has been set to infinity paused for a fixed amount oftime, and then picking the lowest latency destination it sees that isnot infinity;

FIG. 39 is a schematic representation of the network of FIG. 33 showingthat as soon as a node that was at infinity becomes non-infinity ittells the nodes directly connected to it immediately;

FIG. 40 shows a flow-chart depicting the incoming latency updateoutlined in the schematic representations of FIGS. 33-39;

FIG. 41 shows a flow-chart depicting latency at infinity;

FIG. 42 is a schematic representation of a network showing the datastream on nodes between the ultimate sender and ultimate receiver;

FIG. 43 is a schematic representation of a network showing an example ofa potential loop to be avoided;

FIG. 44 shows a chart comparing the median latency over a time period tothe maximum latency over another time period;

FIG. 45 is graph depicting bytes of data in queue over time, and showingminimum queue levels during time intervals;

FIG. 46 is a schematic representation of a network showing that when anode at capacity sees a GUID it sent to a possible additional chosendestination it knows that choice would be a bad choice;

FIG. 47 shows a flow-chart depicting a method of deciding when toadd/remove a chosen destination while not ‘At Capacity’;

FIG. 48 is a schematic representation of a network showing a loop thatwas accidentally created in nodes not in the data stream;

FIG. 49 is a schematic representation of a network showing node A andnode B negotiating so that node A can send to node B;

FIG. 50 is a schematic representation of a network showing how node Aindicates it wants to send more data;

FIG. 51 is a schematic representation of a network showing how two nodescan negotiate transfers of messages when a quota is limited;

FIG. 52 is a schematic representation of a network showing how two nodescan negotiate transfers of messages when a quota is limited;

FIG. 53 is a schematic representation of a network showing how two nodescan negotiate transfers of messages when a quota is limited; and

FIG. 54 is a schematic representation of a network showing each node'snext best step to the core, and that same network rearranged to betterillustrate the hierarchy this process creates.

DETAILED DESCRIPTION OF THE INVENTION

Referring now to FIG. 1 a network in accordance with an embodiment ofthe invention is indicated generally at 30. Network 30 comprises aplurality of nodes N1, N2 and N3. Collectively, nodes N1, N2 and N3 arereferred to as nodes N, and generically they are referred to as node N.This nomenclature is used for other elements discussed herein.

Node N1 is connected to node N2 via a first physical link L1. Node N2 isconnected to node N3 via a second link L2. Node N1 is a neighbour tonode N2 and likewise node N2 is a neighbour to node N1, since they areconnected by link L1. By the same token, node N3 is a neighbour to nodeN2 and likewise node N2 is a neighbour to node N3, since they areconnected by link L2. Thus, the term “neighbour” (and variants thereof,as the context requires) is used herein to refer to nodes N that areconnected another node N by a single link L.

Each node N is any type of computing device that is operable tocommunicate with another node N via a respective link L. Any type ofcomputing device is contemplated, such a personal computer (“PC”), alaptop computer, a personal digital assistant (“PDA”), a voice overinternet protocol (“VOIP”) landline telephone, a cellular telephone, asmart sensor, etc., or combinations thereof. Each node N can bedifferent types of computing devices.

Each link L is based on any type of communications link, or combinationsor hybrids thereof, be they wired or wireless, including but not limitedto OC3, T1, Code Division Multiple Access (“CDMA”), Orthogonal FrequencyMultiple Access (“OFDM”), Global System for Mobile Communications(“GSM”), Global Packet Relay Service (“GPRS”), Ethernet, 802.11 and itsvariants, Bluetooth etc.

It should now be understood that the types of computing devices used toimplement a particular node N, and the types of links L therebetween arenot particularly limited, and that in general terms, each node N isoperable to connect and communicate with any neighbouring nodes N viathe respective link L therebetween.

Each node N maintains a network information database D that isconfigured to maintain knowledge about at least some of the other nodesN within network 30. Each database D is maintained in volatile storage(e.g. random access memory (“RAM”)) and/or non-volatile storage (e.g.hard disc drive) or combinations of volatile or non-volatile storage, ina computing environment associated with its respective node N. DatabaseD is used by each node N to locate other nodes N in network 30, so thatthe particular node N can send traffic to that other node N and/or toshare knowledge about those other nodes N.

Each database D is shown on FIG. 1 as an oval indicated with thereference D and located within its respective node N, to represent thatnode N maintaining its own respective database D. More particularly,database D1 is shown within node N1, database D2 is shown within nodeN2, and database D3 is shown within node N3. The size, complexity andother overhead metrics that define the structure of each database D arechosen so that a particular database D only occupies a portion of theoverall computing resources that are available in its respective node N.The structure of database D is thus typically, though not necessarily,chosen to leave a significant portion of the computing resources of nodeN free to perform the regular computing tasks of that node N. Furtherdetails about such overhead metrics will be discussed in greater detailbelow.

However, for the exemplary network 30 in FIG. 1, it will be assumed thatall nodes N have substantially equal computing resources and that alllinks L have substantially the same service characteristics. (As usedherein, the term “service characteristics” as applied to links Lincludes any known quality of service (“QOS”) metrics includingbandwidth, latency, bit error rate, etc that can be used to assess thequality of a link L. Service characteristics can also include pricing,in that the financial cost incurred to carry traffic over one link maybe different than the financial cost to carry traffic over anotherlink). It will thus be assumed that each database D has substantiallythe same structure—an example of such a structure being shown in TableI.

TABLE I Exemplary Structure of each Database D Row Number Column 1Column 2 Column 3 Heading Rank Node name Best Neighbour

In general terms, each database D provides a list of at least some ofthe nodes N in network 30, other than the node N that is maintaining theparticular database D (“other nodes N”). Each database D also ranksthose other nodes N according to their importance within network 30.Metrics that reflect importance include, but are not limited to, theproximity of such other nodes N, and/or which of the other nodes Ncarries a proportionately greater share of traffic in network 30, and/orthe proximity of a node N to a data flow going to another node N. Othermetrics will now occur to those of skill in the art, some of which willbe discussed in greater detail below. Each database D also identifiesthose other nodes N, and the neighbouring node N that represents thenext best step to reach a respective other node N.

Explaining Table I in greater detail, in Column 1 of Table I, “Rank”,indicates a number, increasing in value for each row in database D basedon the number of other nodes N that are maintained in the particulardatabase D.

In Column 2 of Table I, “Node name” identifies the specific other nodeN. Such a node name can be based on any known or future networkaddressing scheme. Examples of known node addressing schemes includetelephone numbers, or Medium Access Control (“MAC”) addresses, orInternet Protocol (“IP”) addresses. Such an addressing scheme can bechosen according to other factors affecting the design of network 30and/or the nodes N therein. Of note, however, in the addressing schemethe name of each node N need not reflect the location of that node N inthe network, as is found in other addressing schemes—e.g. telephonenumbers that have area codes corresponding to a geographic location. Inorder to simplify explanation of the embodiments herein, the node nameis identified according to the reference character in the Figure s. Forexample, where a Node Name entry under Column 2 indicates “N1”, thennode N1 is being identified.

In Column 3 of Table I, “Best Neighbour” indicates which of theneighbour nodes N provides the next best step in an overall route toreach the other node N named in Column 2. [In the present embodiment,the term “Best Neighbour” is used, but this should not be construed in alimiting sense for all embodiments of the invention, in that any desiredcriteria to determine a “Best Neighbour” or otherwise desired neighbourcan be chosen.] Thus, Column 3 will always identify a neighbour node N,while Column 2 need not indicate a neighbour node N. It should beunderstood that entries in Column 3 need not actually be the name of theneighbour node N, according to the same addressing scheme used forColumn 2, but can be any indicator of that particular neighbour node N.However, to simplify explanation of the embodiments herein, entries inColumn 3 will actually reflect the name of the neighbour node N.

When network 30 is initialized (e.g. when all of the nodes N eachconnect to each other according to the topology shown in FIG. 1), thecontents of each database D will be empty, except that each database Dwill contain a “null” entry identifying the particular node N that ownsthe particular database D. Table II thus shows how database D1 isinitially populated with a “null” entry, identifying node N1.

TABLE II Initial contents of Database D1 Row Number Column 1 Column 2Column 3 Heading Rank Node name Best Neighbour Row 0 Ø N1 N/A

Explaining Table II in greater detail, in Row 0 of Column 1 of Table II,the entry is given a null entry of “0”, to indicate that this particularinformation in database D1 is about the actual node N1 that owns thatdatabase D1. In Row 0 of Column 2 of Table II, the entry is “N1”, toidentify node N1 by name. In Row 0 of Column 3 of Table II, the entry is“N/A”, to indicate that the best neighbour is inapplicable, since thisentry of Table II refers to the owner of database D1.

Likewise, Table III thus shows how database D2 is initially populatedwith a “null” entry, identifying node N2.

TABLE III Initial contents of Database D2 Row Number Column 1 Column 2Column 3 Heading Rank Node name Best Neighbour Row 0 Ø N2 N/A

Likewise, Table IV thus shows how database D3 is initially populatedwith a “null” entry, identifying node N3.

TABLE IV Initial contents of Database D2 Row Number Column 1 Column 2Column 3 Heading Rank Node name Best Neighbour Row 0 Ø N3 N/A

In order to populate a remainder of each database D, and maintain theircontents, a microprocessor on each node N will perform a set ofprogramming instructions. Those instructions can be substantially thesame for each node N. Referring now to FIG. 2, a flowchart representinga method for maintaining network knowledge in accordance with anembodiment of the invention is indicated generally at 200. Method 200can be implemented into a set of programming instructions for executionon a microprocessor on each node N to populate and maintain the contentsof each database D. In order to assist in the explanation of the method,it will thus be assumed that method 200 is operated on each node N insystem 30 in order to maintain the database D respective to that node N.The following discussion of method 200 will thus lead to furtherunderstanding of system 30 and its various components. (However, it isto be understood that system 30 and/or method 200 can be varied, andneed not be performed in the exact sequence shown in FIG. 2, and thatsystem 30 and/method 200 need not work exactly as discussed herein, andthat such variations are within the scope of the present invention.)

Thus, before beginning explanation of method 200, it will be assumedthat database D for each node N has been populated only according toTables II, III and IV, and that each node N has been activated and isphysically connected to each other according to the structure of links Lshown in FIG. 1.

Beginning first at step 210, the presence of neighbours is determined.In general terms, at step 210 each node N determines whether it has anynew neighbouring nodes N, or whether any existing neighbouring nodes Nhave ceased connecting that that node N. When step 210 is firstperformed by node N1, node N1 will thus send out an initializationmessage over link L1 to node N2 in order to query the existence of nodeN2 and the end of link L1. Such an initialization message can beperformed according to any known means corresponding to the type ofprotocol used to implement link L1.

Likewise, step 210 will also be performed by node N2, and node N2 willthus send out a network initialization signal over link L1 to node N1 inorder to query the existence of node N1. By the same token, node N2 willthus send out a network initialization signal over link L2 to node N3 inorder to query the existence of node N3.

Finally, step 210 will also be performed by node N3, and node N3 willthus send out a network initialization signal over link L2 to node N2 inorder to query the existence of node N2.

Referring now to FIG. 3, this initial performance of step 210 by eachnode N is represented by showing a plurality of initialization messagesIM being sent according to the above. Specifically, initializationmessage IM1-2 is being sent from node N1 to node N2; initializationmessage IM3-2 is being sent from node N3 to node N2; initializationmessage IM2-3 is being sent from node N2 to node N3; initializationmessage IM2-1 is being sent from node N2 to node N1. In a presentembodiment, initialization messages IM do not exchange node knowledge,in order to simplify initialization messages IM, and allow nodeknowledge of a node N to spread in substantially the same manner for allnodes N. This initialization message IM can contain processing andmemory characteristics of node N as it relates to the node's ability tomaintain network knowledge. Such processing and memory characteristicscan include, the memory of the node N that is dedicated to maintainingnetwork knowledge, and the like. In the present embodiment, however,node names N themselves are not exchanged as part of the initializationmessages IM.

As a result of locating neighbours using initialization messages IM,each node N will now be aware of its neighbouring nodes N, and thus bein a position to begin populating and maintaining its respectivedatabase D by making use of neighbouring databases D.

Thusly, referring again to FIG. 2, method 200 will advance from step 210to step 220 at which point network knowledge will be exchanged betweenneighbour nodes N, such neighbours having been identified at step 210.Each node N can now make use of a neighbouring database D to gain moreknowledge about network N.

Referring now to FIG. 4, the initial performance of step 220 by eachnode N is represented by showing a set of bi-directional knowledgeexchange messages KEM. The knowledge exchange between node N1 and nodeN2 is indicated as knowledge exchange message KEM1-2, while theknowledge exchange between node N2 and node N3 is indicated as knowledgeexchange message KEM2-3.

Referring again to FIG. 2, method 200 then advances from step 220 tostep 230, at which point local knowledge is updated as a result of theinformation exchange from step 220. As a result of exchanging messagesKEM, databases D1, D2 and D3 can be updated to reflect information aboutneighbouring nodes N, as shown in Tables V, VI, VII respectively. TableV thus shows how database D1 is now populated after the initialperformance of step 230 by node N1.

TABLE V (Updated from Table III) Database D1 Row Number Column 1 Column2 Column 3 Heading Rank Node name Best Neighbour Row 0 Ø N1 N/A Row 1 1N2 N2

Explaining Table V in greater detail, in Row 0 remains the same as fromTable III. However, Row 1 is now populated, showing that node N1 now hasknowledge of a node named node N2, and that node N2 is the bestneighbour through which node N2 can be reached.

Likewise, Table VI thus shows how database D2 is now populated after theinitial performance of step 230 by node N2.

TABLE VI (Updated from Table IV) Database D2 Row Number Column 1 Column2 Column 3 Heading Rank Node name Best Neighbour Row 0 Ø N2 N/A Row 1 1N1 N1 Row 2 2 N3 N3

Explaining Table VI in greater detail, in Row 0 remains the same as fromTable IV. However, Row 1 is now populated, showing that node N2 now hasknowledge of a node named node N1, and that node N1 is the bestneighbour through which node N1 can be reached. By the same token, Row 2is now populated, showing that node N2 now has knowledge of a node namednode N3, and that node N3 is the best neighbour through which node N3can be reached. Note that node N1 has been given a rank of “1”, whilenode N3 has been given a rank of “3”. In the present example, suchrankings were made purely as matter of convenience given that no metricsexist in which to actually choose which to rank higher. However,rankings made on more complex bases will be discussed in greater detailbelow.

Likewise, Table VII thus shows how database D3 is populated after theinitial performance of step 230 by node N3.

TABLE VII (Updated from Table V) Database D3 Row Number Column 1 Column2 Column 3 Heading Rank Node name Best Neighbour Row 0 Ø N3 N/A Row 1 1N2 N2

Explaining Table VII in greater detail, in Row 0 remains the same asfrom Table V. However, Row 1 is now populated, showing that node N3 nowhas knowledge of a node named node N2, and that node N2 is the bestneighbour through which node N2 can be reached.

The contents of Tables V, VI and VII are shown as knowledge paths K,represented by dotted lines in FIG. 5. Knowledge path K1-2 correspondswith Row 1 of Table V, indicating that node N1 has knowledge of N2;knowledge path K2-1 corresponds with Row 1 of Table VI, indicating thatnode N2 has knowledge of N1; likewise knowledge path K2-3 correspondswith Row 2 of Table VI, indicating that node N2 has knowledge of nodeN3; and knowledge path K3-2 corresponds with Row 1 of Table VII,indicating node N3 has knowledge of node N2.

Payload traffic generated at an origin node N that is intended for adestination node N can now actually be delivered to nodes N inaccordance with knowledge paths K. Where a knowledge path exists betweenan origin node N and a destination node N. Such delivery of payloadtraffic can be effected via the best neighbour routings shown in Column3, to the extent that Column 2 is populated in the database D of theorigin node N with network knowledge about the destination node N.

(As used herein, “payload traffic” or “payload” refers to any datagenerated by an application executing on the origin node N that isintended for a destination node N. For example, where nodes N arecomputers, then payload traffic can include emails, web pages,application files, printer files, audio files, video files or the like.Where nodes N are telephones, then payload traffic can include voicetransmissions. Other types of payload data will now occur to those ofskill in the art.)

More specifically, nodes N1 and nodes N2 can now exchange payloadtraffic, since they have knowledge of each other. Nodes N2 and N3 canalso exchange payload traffic, since they have knowledge of each other.However, at this point, nodes N1 and N3 cannot exchange traffic sincethey do not have knowledge of each other.

Having now completely performed method 200 once, method 200 then cyclesback from step 230 to step 200 where method 200 begins anew for thesecond time. Returning again to step 210, the presence of neighbours aredetermined. During this second exemplary cycle through method 200, itwill be assumed that no new nodes N are added to network 30, and noexisting nodes N are removed. Accordingly, nothing occurs at step 210since no changes have occurred and method 200 advances from step 210 tostep 220.

Continuing with the present example, referring again to FIG. 2, method200 will advance again from step 210 to step 220 at which pointadditional network knowledge will be exchanged between neighbour nodesN. Once again, each node N can now make use of a neighbouring database Dto gain more knowledge about network N.

Referring now to FIG. 6, the second performance of step 220 by each nodeN is once again represented by bi-directional knowledge exchangemessages KEM. The knowledge exchange between node N1 and node N2 isindicated as knowledge exchange message KEM1-2, while the knowledgeexchange between node N2 and node N3 is indicated as knowledge exchangemessage KEM2-3.

Referring again to FIG. 2, method 200 then advances, for the secondtime, from step 220 to step 230, at which point local knowledge isupdated as a result of the information exchange from step 220. As aresult of exchanging messages KEM, databases D1, D2 and D3 can beupdated to reflect information about neighbouring nodes N, as shown inTables VIII, IX, X respectively. Table VIII thus shows how database D1is now populated after the second performance of step 230 by node N1.

TABLE VIII (Updated from Table V) Database D1 Row Number Column 1 Column2 Column 3 Heading Rank Node name Best Neighbour Row 0 Ø N1 N/A Row 1 1N2 N2 Row 2 2 N3 N2

Explaining Table V in greater detail, in Rows 0 and 1 remain the same asfrom Table V. However, Row 2 is now populated, showing that node N1 nowhas knowledge of a node named node N3, and that node N2 is the bestneighbour through which node N3 can be reached.

Likewise, Table IX thus shows how database D2 is now populated after theinitial performance of step 230 by node N2.

TABLE IX (Updated from Table VI) Database D2 Row Number Column 1 Column2 Column 3 Heading Rank Node name Best Neighbour Row 0 Ø N2 N/A Row 1 1N1 N1 Row 2 2 N3 N3

Explaining Table IX in greater detail, in Rows 0, 1 and 2 remain thesame as from Table VI, since there are no new nodes N in network 30 fornode N2 to become aware of through exchanging messages with itsneighbouring nodes N.

Likewise, Table X thus shows how database D3 is populated after theinitial performance of step 230 by node N3.

TABLE X (Updated from Table VII) Database D3 Row Number Column 1 Column2 Column 3 Heading Rank Node name Best Neighbour Row 0 Ø N3 N/A Row 1 1N2 N2 Row 2 2 N1 N2

Explaining Table VII in greater detail, in Rows 0 and 1 remain the sameas from Table V. However, Row 2 is now populated, showing that node N3now has knowledge of a node named node N1, and that node N2 is the bestneighbour through which node N1 can be reached.

The contents of Tables X, IX and X are shown as knowledge paths K,represented by dotted lines in FIG. 7. In FIG. 7, (and as previouslyshown in FIG. 5), knowledge path K1-2 indicates that node N1 hasknowledge of N2; knowledge path K2-1 indicates node N2 has knowledge ofN1; likewise knowledge path K2-3 indicates that node N2 has knowledge ofnode N3; and knowledge path K3-2 indicates node N3 has knowledge of nodeN2. However, FIG. 7 also now includes two additional knowledge paths:knowledge path K1-3 indicates that nodes N1 now has knowledge of nodeN3, and likewise knowledge path K3-1 indicates that node N3 now hasknowledge of node N1.

Payload traffic generated at an origin node N that is intended for adestination node N can now actually be delivered to nodes N inaccordance with knowledge paths K. Where a knowledge path exists betweenan origin node N and a destination node N. Such delivery of payloadtraffic can be effected via the best neighbour routings shown in Column3, to the extent that Column 2 is populated in the database D of theorigin node N with network knowledge about the destination node N. Thus,more specifically, all nodes N can all now exchange payload traffic,since they have knowledge of each other. Of particular note, after thispass through method 200, node N1 and node N3 can send payload traffic toeach other, via node N2 as the step between them.

Having now completely performed method 200 twice, method 200 then cyclesback from step 230 to step 200 where method 200 begins anew. Prior tothe performance of the third exemplary cycles through method 200, itwill be assumed that node N3 is removed from network 30 due to a failureof link L2, as represented in FIG. 8. Returning again to step 210, thepresence of neighbours are determined. This third time, during theexchange of initialization messages IM, nodes N2 and N3 will determinethat each other is no longer a neighbour. At step 220 knowledge isexchanged with neighbours according to the neighbours found present atstep 210. Finally, at step 230, local knowledge is updated based on theexchange.

After step 230, and as shown in FIG. 8, the result is that database D1remains the same, maintaining the contents as shown in Table VIII,because insufficient cycles of method 200 have occurred for the loss ofnode N3 to propagate to database D1. However, database D2 is now updatedin accordance with Table XI.

TABLE XI (Updated from Table IX) Database D2 Row Number Column 1 Column2 Column 3 Heading Rank Node name Best Neighbour Row 0 Ø N2 N/A Row 1 1N1 N1

Database D3 is also updated to reflect the initial data found in TableIV. This is represented in FIG. 8, and no existing nodes N are removed.Accordingly, nothing occurs at step 210 since no changes have occurredand method 200 advances from step 210 to step 220.

The contents of the databases D after this third pass of method 200 arereflected by the knowledge paths K shown in FIG. 8.

During a fourth pass of method 200, the loss of node N3 will finallypropagate to node N1, resulting in the knowledge paths K shown in FIG.9.

(Those of skill in the art will recognize that the foregoing issimplified explanation for purposes of explanation, which whenimplemented can cause the introduction of a trivial loop. To addressthis, a ‘poison reverse’ can be introduced to get rid of the trivialloop that gets introduced in any network when a node is removed. Apoison reverse is discussed in greater detail below. to further reduceintroductions of loops, a delay can be introduced during the spread ofnode knowledge, while implementing a ‘zero’ delay (e.g. substantiallyinstantaneous) removal of node knowledge. Finally when the distance fromdata flow (discussed in greater detail below) reaches a certain limit anode informs its neighbouring nodes to remove knowledge of thatparticular node even if it still has valid knowledge of that node. Amore detailed discussion of node removal is provided further below.)

It should now be understood that the teachings herein are applicable tonetworks of greater complexity than network 30. For example, referringnow to FIG. 10, a slightly more complex network in accordance withanother embodiment of the invention is indicated generally at 30 a.Network 30 a includes substantially the same elements as network 30, andlike elements include like references but followed by the suffix “a”.More specifically, network 30 a includes more nodes Na and links La, butthe basic structure of those nodes Na and links La are substantially thesame as their counterparts in system 30. To simplify explanation,however, network 30 a is shown without specific tables showing thecontents of databases Da.

Network 30 a includes nodes Na1, Na2 and Na3 that are connected vialinks La1 and La2 like their respective counterparts nodes N1, N2 and N3in network 30. In this example, it is initially assumed that network 30a has undergone two complete passes through method 200 and thusdatabases D are in the same state as shown for network 30 in FIG. 7. Incontrast to network 20, however, it is also assumed that network 30 aincludes a fourth node Na4, that is initially, not connected to anyother node Na.

The embodiments of the invention described herein includes aself-organizing network and computer readable medium for storing a setof programming instructions for execution within a self-organizingnetwork. Referring now to FIG. 11, assume that node N4 a joins the restof network 30 a by the formation of link L3 a spanning node N4 a andnode N2 a; and by the formation of link L4 a spanning node N4 a and nodeN3 a. After a sufficient number of cycles of method 200 are performed byeach node N, additional knowledge paths K (as shown in FIG. 11) willform according to the updated contents of databases D, as aggregated inTable XII.

TABLE XII Databases Da Database D1a Database D2a Database D3a DatabaseD4a 3 6 9 12 2 Best 5 Best 8 Best 11 Best 1 Node Neigh- 4 Node Neigh- 7Node Neigh- 10 Node Neigh- Row Rank Name bour Rank Name bour Rank Namebour Rank Name bour 1  N1a N/A  N2a N/A  N3a N/A  D4a N/A 2 1 N2aN2a 1 N3a N3a 1 N2a N2a 1 N2a N2a 3 2 N3a N2a 2 N4a N4a 2 N4a N4a 2 N3aN3a 4 3 N4a N2a 3 N1a N1a 3 N1a N2a 3 N1a N2a

Payload traffic generated at an origin node Na that is intended for adestination node Na can now actually be delivered in accordance withknowledge paths Ka. For example, assume that node N4 a wishes to sendpayload traffic to node N1 a. Using the information in Table XII, it canbe seen that traffic will be routed to node N1 a from node N4 a via nodeN2 a. This traffic path P is shown in FIG. 12, which shows network 30 ain the same state as FIG. 11, but with knowledge paths Ka removed sothat traffic path P can be seen more clearly.

At this point it can be noted that various nodes can reach other nodesthrough different paths, even though certain preferred paths have beenidentified. Such preferred paths have been chosen since the embodimentsthus far have assumed that all links L and La have substantially thesame service characteristics. For example, in FIGS. 11 and 12,corresponding to Table XII, node N4 a reaches node N1 a via node N1 a.This is reflected in Table XII at Row 4, Column 12, wherein node N2 a isreflected as the next best neighbour to reach node N1 a from node N4 a.However, while less preferred in the example shown in Table XII, it isphysically possible for payload traffic to be delivered along the pathfrom node N4 a, via node N4 and node 2 a before final delivery to nodeN1 a, which is the path that would be used if link L3 a did not exist.

However, in another embodiment, service characteristics for each linkcan vary, and databases for each node incorporate knowledge of suchservice characteristics when selecting a best neighbour as a next beststep through which to route payload traffic. For example, referring nowto FIG. 13, another network in accordance with another embodiment of theinvention is indicated generally at 30 b. Network 30 b is substantiallythe same as network 30 a, and like elements include like references butfollowed by the suffix “b”. More specifically, network 30 b includeslinks Lb, which follow the same paths as links La in network 30 a. Also,network 30 b includes four nodes Nb, which are substantially the same asnodes Na in network 30 a. However in network 30 b each link Lb hasdifferent service characteristics, whereas in network 30 a each link Lahas the same service characteristics. Table XIII shows an exemplary setof service characteristics for each link Lb.

TABLE XIII Service Characteristics for Links Lb Column 1 Column 2 Column3 Row Link Bandwidth Cost 1 L1b 1 Megabit/ $0.10 per second kilobyte 2L2b 10 Megabit/ $0.05 per second kilobyte 3 L3b 0.5 Megabit/ $0.10 persecond kilobyte 4 L4b 10 Megabit/ $0.20 per second kilobyte

Explaining Table XIII in greater detail, column 1 identifies theparticular link Lb in question. Column 2 identifies the bandwidth of thelink Lb identified in the same row. Column 3 indicates the financialcost for carrying traffic over a particular link Lb in terms of centsper kilobyte. (It should now be understood that Table XIII can includeany other service characteristics that are desired, such as bit errorrate, latency etc.) The information for each link Lb can thus be madepart of each database Db, and propagated through network 30 b usingmethod 200 or a suitable variant thereof, in much the same manner asnode knowledge can be propagated throughout network 30 b.

Databases Db respective to each node Nb know the details of each link Lbto which they are directly connected. For example, Node N4 b will knowthe details of links L3 b and L4 b as shown in Table XIII. By the sametoken, node N3 b will know the details of links L4 b and L2 b. In apresent embodiment, each node Nb only knows about itself and the linksLb that it has to directly connected nodes Nb. But each node Nb need noknows anything about the overall network topology.

However, each node Nb Databases Db respective to each node Nb on eitherend of a particular pathway will know the cumulative servicecharacteristics associated with the links Lb that define that pathway,once that database Db has knowledge of that node. Thus, once node N4 bknows about node N1 b, node N4 b will also know the cumulative servicecharacteristics, (and therefore the cumulative ‘cost’) of all links Lbbetween node N4 b and node N1 b.

Thus, once a particular node Nb has information about thecharacteristics of a particular link, then that node Nb can use suchinformation in order to determine the “Best Neighbour” as the next beststep through which to route traffic. For example, in Table XIII it canbe seen that the bandwidth of link L3 b is only 0.5Megabits/second—whereas the bandwidth of link L4 b and link L2 b areboth ten Megabits/second. Thus, payload traffic sent from node N4 b tonode N2 b will be delivered to node N2 b much faster if it is sent vianode N3 b, rather than if it is sent directly over link L3 b.

Thus, using Table XIII node N4 b can determine that node N3 b is thenext best step to reach both nodes N2 b and nodes N1 b, if speed ofdelivery of payload traffic is a priority. Table XIV thus shows how aportion of databases Db would appear if node N4 b made such adetermination (assuming that the information in Table XIII is not shownin Table XIV).

TABLE XIV Databases Db Database D1b Database D2b Database D3b DatabaseD4b 3 6 9 12 2 Best 5 Best 8 Best 11 Best 1 Node Neigh- 4 Node Neigh- 7Node Neigh- 10 Node Neigh- Row Rank Name bour Rank Name bour Rank Namebour Rank Name bour 1  N1b N/A  N2b N/A  N3b N/A  D4b N/A 2 1 N2bN2b 1 N3b N3b 1 N2b N2b 1 N2b N3b 3 2 N3b N2b 2 N4b N4b 2 N4b N4b 2 N3bN3b 4 3 N4b N2b 3 N1b N1b 3 N1b N2b 3 N1b N3b

By the same token, FIG. 12 shows the path Pb of payload traffic fortraffic originating from node N4 b destined for node Nb based on thecontents of database D4 b as shown in Table XIV. In FIG. 12, path Pbdoes not travel via link L3 b, but instead travels via links L4 b and L2b. It should now be understood that complex, and multiple criteria canbe employed when determining the best neighbour through which to routetraffic. Table XIV can thus be populated optimizing servicecharacteristics of link Lb, optimizing for bandwidth, cost, bit errorrate, etc.

Of course, Table XIV would change if the best neighbour was chosen basedon the next best step having the least financial cost, and ignoringbandwidth altogether. Referring again to Table XIII, in this case, sincelink L3 b is financially less expensive than link L4 b, then node N4 bwould choose node N2 b as its next best step to reach node N2 b and nodeN1 b, and thus database D1 b would appear the same as database D1 a inTable XII.

It should now be apparent that the next best step can be based on a setof complex criteria for evaluating each link—for example, some overallrating of a link Lb can be determined by combining columns 2 and columns3 of Table XIII, to provide a service characteristic rating that is acombination of both bandwidth and financial cost for a particular link.

It is again to be emphasized that the teachings herein are applicable tonetworks of greater complexity than networks 30, 30 a and 30 b. Forexample, referring now to FIG. 14, a more complex network in accordancewith another embodiment of the invention is indicated generally at 30 c.Network 30 c includes the same types of elements as networks 30, 30 aand 30 b and like elements include like references but followed by thesuffix “c”. Of note, in this embodiment it is assumed that all links Lchave substantially the same length and substantially the same servicecharacteristics, though in other embodiments links Lc can have varyinglengths and service characteristics, similar to links Lb. Network 30 cincludes more nodes Na and links La, and to simplify explanation,however, network 30 c is shown without specific tables showing thecontents of databases Dc.

In contrast to networks 30, 30 a and 30 b, however, in network 30 c itis assumed that databases Dc have only a limited number of rows in orderto set an upper limit on the memory resources of each node Nc that willbe consumed by its respective database Dc. Thus, in this network 30 c,each node is does not maintain knowledge about the entire network 30 c,but only a portion of the network 30 c. (Such a configuration is in factpresently preferred when the teachings herein are applied to networks ofa size where knowledge of the entire network results in an impracticallylarge consumption of the overall computing resources of a given node.)For purposes of assisting in explanation, it will be assumed that eachdatabase Dc can store eleven rows of information. The first row is thenull row as previously described in relation to Table II, whichidentifies the node Nc to which a particular database Dc belongs. Theremaining nine rows allow the database Dc to maintain knowledge of nineother nodes Nc within network 30 c.

Note that it is not necessary for each node Nc to have the same capacityfor storage, and such capacity need not be fixed but can be dynamicallyallocated, either automatically or manually, as the needs of aparticular node Nc change, but nodes Nc in network 30 c are constructedto a limit of nine other nodes for explanation purposes.

Databases Dc for each node Nc maintain a concept of a “core”. Where aspecific node Nc is not included in a particular database Dc, then thecore represents a default path for which that given node Nc may belocated. As shown in FIG. 14, network 30 c includes a core Cc which liesalong link L6 c, the details of which will be discussed in greaterdetail below. In general, it is presently preferred to ensure that theaggregate storage capacity of at least the databases Dc that comprisethe core Cc is sufficient to ensure that the databases Dc that definethe core Cc have knowledge of every node Nc within the network 30 c.Accordingly, the size of the network according to the architecture ofnetwork 30 c will thus complement the collective storage capacity of thetwo nodes Nc that define the core Cc. Thus, in the present example,collectively, node N6 c and node N9 c have sufficient capacity such thatthe nine rows in each of databases D6 c and D9 c are sufficient tomaintain knowledge of every node within network 30 c.

Thus, while each node Nc performs method 200, it will “hear” of moreother nodes Nc than that node Nc will store. Accordingly, each node Ncis also operable to perform a prioritization operation to choose whichnine other nodes Nc within network 30 c to maintain knowledge of withinits database Dc. Such a prioritization operation can be based on anyprioritization criterion or criteria, or combinations thereof, such aswhich other nodes Nc are closest, which other nodes Nc carry the mosttraffic, which other nodes Nc does that particular node typically sendpayload traffic, etc, and such other criteria as will now occur to thoseof skill in the art. Such prioritization criteria thus also provides the“rank” of each node Nc in order of importance, thereby defining theorder in which the database Dc is populated, and the order for whichnode knowledge should be sent to other nodes Nc in the network.

In the present example, it will be assumed that the prioritizationcriteria for each node Nc is to populate its database Dc in ordermaintain knowledge of:

-   -   (a) the other nodes Nc that are closest that that node Nc        (“proximal nodes Nc”). Proximal nodes Nc are ranked in order of        proximity;    -   (b) originating or destination nodes Nc with which the node Nc        must have knowledge of in order to pass payload traffic on        behalf of that originating or destination node Nc; (“originating        or destination nodes Nc”). Originating and destination nodes are        ranked according to the amount of payload traffic being carried        on their behalf, and supersede proximal nodes.    -   (c) the other nodes Nc with which that node Nc sends or receives        payload traffic; (“payload traffic nodes Nc”). Payload traffic        nodes Nc are ranked according to the importance of a particular        payload traffic in relation to another, and supersede all        proximal nodes and supersede up to half of the originating or        destination nodes Nc. Importance of payload traffic can be based        on volume of traffic, or speed of traffic, or the like;    -   (d) up to a maximum of nine other nodes Nc during any particular        cycle of method 200.

It is to be reiterated that the foregoing prioritization criteria issimplified for purposes of explanation of the present embodiment. Inanother, more presently preferred embodiment, nodes are ordered by theirdistance from a marked data stream value, except in such cases where:

-   -   1. This node is in the path of a High Speed Propagation Path        (“HSPP”, which is discussed in greater detail below) for this        destination node, and this directly connected node is:    -   In the path to the core and the HSPP is a notify HSPP.    -   One of the nodes that told us of this HSPP and the HSPP is a        request HSPP.    -   2. This node is marked in the data stream for this destination        node.        If a node is marked in the data stream it will tell its directly        connected nodes that have not marked it in the data stream a        Distance from Data Stream (also referred herein as a Distance        From Stream or “DFS”) of 0. those that have marked it in the        data stream it will tell a DFS equal to the link Cost (“LC”)        associated with the Service Characteristics of the links to the        destination node. This will be explained in greater detail        below.]

The formation of core Cc in network 30 c will now be explained. Innetwork 30 c, it is initially assumed that nodes N2 c through N13 c areconnected by links L1 c through L11 c, as shown in FIG. 14. It is alsoassumed that nodes N1 c and nodes N14 c are initially not connected tothe remainder of network 30 c.

It is also assumed that, initially, no node Nc is attempting to sendpayload traffic to another node Nc, and that method 200 has beenperformed by each of nodes N2 c and N13 c to populate their respectivedatabases Dc, subject to the prioritization criteria described above.FIG. 15 shows the other nodes Nc with which database D2 c will bepopulated, represented as a closed dashed Figure and referred to hereinas knowledge path block K2 c-xc. Knowledge paths block K2 c-xc surroundsall of the other nodes Nc of which node N2 c is aware, i.e. nodes N3c-N10 c, an node N12 c. FIG. 16 shows the other nodes Nc with whichdatabase D6 c will be populated, represented as a closed dashed Figureand referred to herein as knowledge path block K6 c-xc. Knowledge pathblock K6 c-xc surrounds all of the other nodes Nc of which node N6 c isaware, i.e. nodes N2 c-N5 c, and nodes N7 c-N11 c, and node N12 c. Whilenot shown in the Figures, those of skill in the art will now appreciatethe contents of the other databases Dc at this point in the presentexample.

At this point, it is also useful to note that payload traffic betweenany of nodes N2 c-N6 c and any of nodes N8 c-N13 c will all need to passthrough link L6 c. Thus, for this particular network, link L6 crepresents the “core” of network 30 c at this point in the example. Thecore is shown in FIG. 14 as an ellipse encircling link L6 c andindicated at Cc. The fact that link L6 c is specifically the core Cc ofnetwork 30 c need not be expressly maintained in each database Dc.Rather, each database Dc will determine a “Best Neighbour” indicating aneighbour that is the next best step in order to reach core Cc. The“Best Neighbour” to reach core Cc can be determined by examiningdatabase Dc to find which neighbouring node Nc is most frequentlyreferred to as the “Best Neighbour” to reach the other nodes that areexpressly maintained in database Dc. In the event that no neighbouringnode Nc appears more frequently as a Best Neighbour, then the BestNeighbour appearing in Row 1, associated with the top-most ranked othernode, can be selected as the Best Neighbour to reach the network 30 c. Acore is formed any time that two neighbouring nodes Nc point to eachother as being the Best Neighbour to reach the core.

(Applying this core determination method to an earlier example, in TableXII and FIG. 11 recall that the Best Neighbour to the core of network 30a for node N1 a would be node N2 a; the Best Neighbour to the core ofnetwork 30 a for node N2 a would be node N3 a; the Best Neighbour to thecore of the network 30 a for node N3 a would be Node N2 a; and the BestNeighbour to the core of network 30 a for node N4 a would be node N2 a.Since node N2 a points to node N3 a, and node N3 a points to node N2 a,then link L2 a would be the “core” of network 30 a.)

It should now be apparent that when a network, such as network 30 c, isfirst initialized a plurality of cores will form until method 200 isperformed a sufficient number of times such that databases Dc arepopulated and maintain a substantially steady state. Also, as nodes Ncare added or removed, or links Lc are added or removed, (and/or otherfactors affecting the overall state of the network change), then thelocation of core Cc can change, and/or multiple cores can form.

Building on the example shown in FIGS. 14-16, and referring now to FIG.17, it will be assumed that two new links are added to network 30 c.Specifically, link 12 c now joins nodes N1 c and N2 c, while link 13 cnow joins nodes N13 c and N14 c. As method 200 is performed by nodes N1c and N14 c, and re-performed by the remaining nodes Nc, the location ofcore Cc at link L6 c will ultimately not change in this particularconfiguration of network 30 c. However, the contents of each database Dcmay change according to the above-mentioned prioritization criteria. Forexample, node N2 c will add node N1 c to its database D2 c, and dropnode 12 c from database D2 c.

Also of note, node N1 c will populate database D1 c with knowledge aboutnodes N2 c-N10 c, while node N14 c will populate database D14 c withknowledge about nodes N5 c-N13 c. Thus, nodes N1 c and N14 c will nothave knowledge of each other. Now assume that node N1 c wishes to sendpayload traffic to node N14 c.

Since node N1 c has no knowledge of node N14 c, at this point node N1 ccan perform method 800 shown in FIG. 18 in order to gain such knowledge.Beginning at step 810, originating node N1 c will receive a request tosend payload traffic to destination node N14 c. Such a request can comefrom another application executing on a computing environment associatedwith originating node N1 c.

(As an aside, and as will become more apparent from further teachingsherein, to put this entire method in more colloquial terms, a requestsent to a neighbor node can be in the form of: ‘if you see routeinformation for my destination node, can you make sure to tell me aboutit so I can make a good choice on where to send my payload data’. If anode has some payload to send, but no place to send it, it will hangonto that payload until a timeout on the payload expires (if there isone), or it needs that room for other packets, or it gets told a routeupdate that will allow it to route to a directly connected node.)

Next, at step 820, a determination is made as to whether the destinationnode to which the payload traffic is destined is located in the localdatabase Dc. In this example, recall that database D1 c does not includeinformation about destination node N14 c, and so the result of thisdetermination would be “no”, and method 800 would advance from step 820to 830. (If, however, the destination node was in the database D1 c,then at step 840, payload traffic could be sent via the Best Neighbouridentified in the database, in much the same manner as was described inrelation to network 30 a in FIG. 12, or network 30 b in FIG. 13.)

Next, at step 830 a query will be sent towards the core asking forknowledge of destination node N14 c. Such a query will be passed towardsthe core Cc, by each neighbouring node Nc, along the path of “BestNeighbours” that lead to core Cc, until the query reaches a node Nc thathas knowledge of node N14 c. Thus, each node Nc will receive the query,examine its own database Dc, and, if it has knowledge of destinationnode N14 c, it will send such knowledge back through the path tooriginating node N1 c. If the node Nc receiving the query does not haveknowledge of destination node N14 c, then it will pass the query on tothe neighbouring node Nc that is its Best Neighbour leading to core Cc,until the query reaches a node Nc that has knowledge of node N14 c. Inthe present example, the query from node N1 c will follow the path fromnode N2 c; to node N3 c; to node N6 c; and finally to node N9 c, sincenode N9 c will have knowledge of node N14 c due the prioritizationcriteria defined above. Thus, the knowledge of node N14 c will be passedback through node N6 c; to node N3 c; to node N2 c and finally to nodeN1 c, with nodes N6 c, N3 c, N2 c each keeping a record of the knowledgeof node N14 c in their respective databases Dc so that they can passpayload traffic on behalf of network N1 c.

Next, at step 840 a response will eventually be received by theoriginating node Nc to the query generated at step 830. In the presentexample, node N1 c will thus receive knowledge back from node N9 c aboutnode N14 c, and, at step 850, node N1 c will update its database D1 cwith knowledge of node N14 c. Method 800 can then advance from step 850to step 830 and payload traffic can be sent to node N14 c from node N1c, in much the same manner as was described in relation to network 30 ain FIG. 12, or network 30 b in FIG. 13.

Building on the example shown in FIG. 17, and referring now to FIG. 19,it will be assumed that three new links are added to network 30 c.Specifically, link L14 c is added to join node N12 c and node N8 c; linkL15 c is added between node N8 c and node N5 c; and link L16 c is addedbetween node N5 c and node N2 c. ach node Nc performs method 200 anumber of times to absorb the knowledge of these new links Lc. As suchknowledge propagates throughout network 30 c, eventually, the path ofpayload traffic from node N1 c to node N14 c will travel via nodes N2 c;N5 c; N8 c; N12 c and N13 c.

It should now be understood that where links L14 c, L15 c and L16 cexisted prior to nodes N1 c and N14 c gaining knowledge of each other,then nodes N1 c will initially gain knowledge of node N14 c via the coreCc as described in relation to FIGS. 17 and 18; and then the optimumpath (i.e. path with the fewest number of hops through Best Neighbours)will converge to the example shown in FIG. 19.

While method 800 is directed to “pulling” knowledge of a destinationnode N that is not known by an originating node from the core Cc, itshould now also be appreciated that where a new destination node Ncjoins network 30 c, that node Nc can also “push” knowledge of itselftowards nodes at the core Cc, so that when method 800 is performed anoriginating node Nc can be sure that it will find information about thenew/destination node Nc at core Cc. In the example given in FIG. 14,such a “push” of knowledge was not needed due to the performancecriteria that automatically ensured that node N9 c at core C would gainknowledge of node N14 c. However, in other configurations of network 30c, a “push” of knowledge of nodes Nc at the core Cc can be desired.

While only specific combinations of the various features and componentsof the present invention have been discussed herein, it will be apparentto those of skill in the art that desired subsets of the disclosedfeatures and components and/or alternative combinations of thesefeatures and components can be utilized, as desired. For example, Whilethe foregoing discussions contemplates substantially synchronousperformance of method 200 by each node N (and its variants), it shouldbe understood that such synchronous performance is not necessary and isused merely to simplify explanation.

As another variation, each node N (and its variants) can also keep aseparate record of all information that was sent to that node N (and itsvariants) by neighbouring nodes N (and its variants), even if thatparticular neighbouring node N (and its variants) was not chosen as theBest Neighbour for storage in that database D. This can allow that nodeN with its Best Neighbour removed to select its next Best Neighbour fromthe remaining neighbouring nodes N without having to rerun method 200,or otherwise wait for an update from all other remaining neighbournodes.

The present invention thus provides a novel system, method and apparatusfor networking.

Still further embodiments of the invention are contemplated and a reviewof certain of these embodiments will lead to further understanding ofthe invention. In the embodiments that follow, certain terms or conceptsmay differ somewhat from the previous section. Such differences are tobe viewed as alternatives and/or supplements to the previousembodiments.

In general, the network architecture of the present invention can enableindividual nodes in the network to coordinate their activities such thatthe sum of their activities allows communication between nodes in thenetwork.

The principle limitation of existing ad-hoc networks used in a wirelessenvironment networks is the ability to scale past a few hundred nodes,yet the network architecture and associated methods at least mitigateand in certain circumstances overcome prior art scaling problems.

Exemplary embodiments thus follow in order to clarify understanding.These examples, when making specific reference to numbers, otherparties' software or other specifics, are not meant to limit thegenerality of the method and system described herein. A person of skillin the art will be able to realize when two or more merged conceptscould be separately implemented or useful, even if not explicitlydescribed as such. Alternative embodiments should not be consideredmutually exclusive unless specifically stated.

In the following embodiments, the following terms are used:

Nodes

Each node in a network is directly connected to one or more other nodesvia a link. A node could be a computer, network adapter, switch,wireless access point or any device that contains memory and ability toprocess data. However, the form of a node is not particularly limited.

Links

A link is a connection between two nodes that is capable of carryinginformation. A link between two nodes could be several different linksthat are ‘bonded’ together. A link could be physical (wires, etc),actual physical items (such as boxes, widgets, liquids, etc), computerbuses, radio, microwave, light, quantum interactions, sound, etc. A linkcould be a series of separate links of varying types. However, the formof a link is not particularly limited.

Calculation of Link Cost

‘Link Cost’ is a value that allows the comparison between two or morelinks. In this document the lower the ‘link cost’ the better the link.This is a standard approach, and someone skilled in the art will beaware of possible variations.

Link cost is a value that is used to describe the quality of the link.The link cost for the link can be based on (but not limited to):

1. line quality

2. uptime

3. link consistency

4. latency

5. bandwidth

6. signal to noise ratio

7. remaining battery power on the node

The link cost will be able to change over time as the factors that is itbased on change.

Persons skilled in the art will be able to assign link costs, or createa dynamic discovery mechanism.

It is suggested that the assignment of link costs is consistent acrossthe network. For example two identical links in different parts of thenetworks should have the same or similar link costs.

It is suggested that the link cost of a pipe has an approximately directrelationship to its quality. For example, a 1 Mbit pipe should have 10times the link cost of 10 Mbit pipe. These link costs will be used tofind the best path through the network using a Dykstra like algorithm

An alternative embodiment involves randomly varying the calculated linkcost by a small random amount that does not exceed 1% (for example) ofthe total link cost.

Node Names

Each node in the network has a unique name.

This unique name could be:

-   -   1. Generated by the node.    -   2. Assigned prior to node activation.    -   3. Requested from a central location by the node in a manner        similar in result to a DHCP (Dynamic Host Configuration        Protocol) server. If a node was to request a name from a central        location using this described network, it would first pick a        random unique name and use that name to request a name from the        central location.

A node may keep its name permanently, or may generate a new name onstartup or any time it wants to. Node A can send a message to node B ifnode A knows the name of node B.

A node may have multiple names in order to emulate several differentnodes. For example a node might have two names: ‘Print_Server’ and‘File_Server’.

A node may generate new a name for each new connection that isestablished to it.

Ports are discussed as a destination for messages, however the use ofports in these examples is not meant to limit the invention to onlyusing ports. A person skilled in the art would be aware of othermechanisms that could be used as message destinations. For example,nodes could generate a unique name for each connection.

Usually nodes should have a unique name. An alternative embodiment wouldallow a node to share a name with another node or nodes in the network.This will be discussed in detail later.

There is no limitation implied by the inventors as to the number ofnames a node has, how often it adds or removes names, what the name is,or if it tells anyone about the name or names that it has selected.

For the sake of clarity in this document we assume that each node hasonly one unique name associated with it. This should not be seen aslimiting the scope of this invention. A node may share the same name asone or more other nodes in the network.

Establishing Connections Between Nodes

If a link is able be established between two nodes and these nodes wishto establish a link then nodes will need to exchange some information toestablish that connection. This information may include version numbers,etc.

Alternative embodiments could include the exchanging of a ‘tie-breaker’number that will allow a node to choose between to otherwise equallinks. It is suggested that the same tie-breaker value is given to alldirectly connected nodes. If a node A tells node B that it has alreadyseen an equivalent tie-breaker number from some other node then node Bwill need to pick a new tie-breaker number and send it to all of itsdirectly connected nodes. This process is illustrated in FIG. 20.

The request for a new tie-breaker number might look like this (forexample):

  struct sRequestNewTieBreaker {  // This structure is empty, if thenode sees this message it will  // generate a new tie breaker value andtell all its directly connected  // nodes this new value }

Alternative embodiments could include a maximum count of nodes that thisnode wants to know about. For example, if node A has limited memory itwould tell node B to tell it about no more then X different nodes.

Alternative embodiments could include exchanging of link costs for thelink that was used to establish the connection. If the link cost changesduring the operation of the network a node may send a message to itsdirectly connected node on the other end of the link that the link costhas changed. If link costs are exchanged, nodes may agree on the samelink cost or may still pick different link costs, indicating anasymmetrical connection.

If all three previous alternative embodiments were included the messageexchanged would look like this (for example):

struct sIntroMessage {  // the number used to break ties  intuiTieBreakerNumber;  // the maximum number of destination nodes  // thisnode wants to know about.  int uiNodeCapacityCount;  // the link costfor the connection between these  two nodes float fLinkCost; }

FIG. 21 is a flowchart of initialization process.

A connection is assumed to be able to deliver the messages in order anderror free. If this is not possible is it assumed that the connectionwill be treated as ‘failed’.

The Spread of Node Knowledge

In order for node A to send a message to node B, node A needs to knowthe name of node B as well as the directly connected node or link thatis the next best step to get to node B.

If node A or node B wants another node to send them messages then theyhave to tell at least one directly connected node about their name.

When a node has established a link to another node it can start sendingnode information. Node information includes the name of the node and thecumulative link cost to reach that node. When the network is just turnedon, no node knows about the names of any other node except itself, thusthe initial cumulative link cost for the nodes that it knows about(itself) would be 0.

When a node receives knowledge of another node A from link L it will addthe link cost of the link L to the cumulative link cost it was told fornode A and store that information associated with the link L in databaseD. When the link cost for node A that was received from connection L isreferenced by this node from database D, it will implicitly include thelink cost for that link L that was added to it.

Each node stores the information that it has received from each link. Anode does not need to know the name of the node on the other end of thelink. All it needs to do is record the knowledge that the node on theother end of the link is sending it. A node will store all the nodeupdates it has received from neighbour nodes.

When a node N has received knowledge of node B from a link it willcompare the cumulative link costs for node B that it has received fromother links. It will pick the link with the lowest cumulative link costas its “Best Neighbour” for the messages flowing to node B. When a nodeN sends an update for node B to its directly connected nodes it willtell them the name of the node and the lowest cumulative link cost thatit has received from its directly connected nodes.

Cumulative link cost is additive. FIG. 22 demonstrates this additiveproperty.

This process continues until knowledge of the node has spread throughthe entire network and each node has selected one link as having thelowest cumulative link cost.

This process is very similar to Dykstra's algorithm, or the Bellman-Fordalgorithm for finding the shortest path through a network. Someoneskilled in the art will recognize such approaches, and the variationsthat yield a similar result.

FIGS. 23-26 show the flow of node knowledge through a network. All thelinks are assumed to have cost of one. This is considered to be anexample only and in no way is meant to limit the generality of thisinvention. For example, links may have different link costs, and thenumber of nodes and their specific interconnections may be infinitelyvaried.

At no point does a node need to build a global view of the networktopology. A node is only aware of node knowledge its directly connectedneighbor nodes have told it. This type of network might be compared to adistance vector network by someone skilled in the art.

An alternative embodiment could use the tie-breaker number (discussedearlier) to pick between two or more links with the lowest cumulativelink cost.

The structure for message that spreads node knowledge might look likethis (for example)

struct sNodeKnowledge {  Name NameOfTheNode;  Float fCumulativeLinkCost;}

The fCumulativeLinkCost should be set to zero on the node with thatparticular name.

Alternative embodiments could have the fCumulativeLinkCost set tonon-zero on the node with that particular name. This could be used todisguise the true location of a destination node. Setting thefCumulativeLinkCost to non-zero on the node with that particular name(for example 50) will not affect the convergence of the network.

Link Cost Changes

If a link cost changes then the node will need to need to take thedifference between the new link cost and the old link cost and add it tothe cumulative link cost for all node information that has been receivedfrom that link.

Below is exemplary pseudo code that shows how cumulative link cost canbe adjusted for each node update that was received from the link thatchanged its link cost.

  CumualtiveLinkCost = CumualtiveLinkCost + (NewLinkCost− OldLinkCost);If (CumualtiveLinkCost > INFINITY) CumualtiveLinkCost = INFINITY;

On the basis of this change it will also re-evaluate its choice of ‘BestNeighbour’. It will also need to tell its neighbors about any nodeswhere the lowest cumulative link cost for a particular destination nodechanged.

For example, if the link cost for a link that was not chosen as a ‘BestNeighbour’ for a destination node A changes, and after the change thatlink is still not chosen as a ‘Best Neighbour’ for destination node A,then the cumulative link cost would remain the same for node A and noupdates would need to be sent to directly connected nodes.

Link Removal (If a Link is Removed)

This can be looked at the same way as the link cost for that link goingto infinity.

For each destination node that was using this link as it ‘BestNeighbour’ the next best ‘non-infinity’ alternative for will beselected. If there is no such alternative then no ‘Best Neighbour’ canbe selected and all directly connected nodes will be told a cumulativelink cost of infinity for those nodes.

If no ‘Best Neighbour’ is selected then messages destined for thosenodes will not be able to sent.

Large Networks

In large networks with a large variation in interconnect speed and nodecapability different techniques need to be employed to ensure that anygiven node can connect to any other node in the network, even if thereare millions of nodes.

Using the original method, knowledge of a destination node will spreadquickly through a network. The problem in large networks is three fold:

-   -   1. The bandwidth required to keep every node informed of all        destination nodes grows to a point where there is no bandwidth        left for data.    -   2. Bandwidth throttling on destination node updates used to        ensure that data can flow will slow the propagation of        destination node updates greatly.    -   3. Nodes with not enough memory to know of every node in the        network will be unable to connect to every node in the network,        and may also limit the ability of nodes with sufficient        resources to connect to every node in the network.

A solution is found by introducing the idea of the ‘core’ or center ofthe network.

The core of the network will most likely have nodes with more memory andbandwidth then an average node, and most likely to be centrally locatedtopologically.

Since this new network system does not have any knowledge of networktopology, or any other nodes in the network except the nodes directlyconnected to it, nodes can only approximate where the core of thenetwork is.

This can be done by examining which link is a ‘Best Neighbour’ for themost destination nodes. A directly connected link is picked as a ‘BestNeighbour’ for a destination node because it has the lowest cumulativelink cost. The lowest link cost will generally be provided by the linkthat is closest to the ultimate destination node. If a link is used as a‘Best Neighbour’ for more destination nodes then any other link, thenthis link is considered a step toward the core, or center of thenetwork.

An alternative embodiment could be the node making a decision as to thenext best step to some other node or beacon, and use this as its ‘nextbest step to the core’.

An alternative embodiment could be the node using some combination offactors to determine what its ‘next best step to the core’ is. Thesefactors could some combination of (although not limited to):

1. Radio direction finding of some target beacon

2. GPS position co-ordinates, and the next best step to some location

3. A special marker node or nodes.

4. Other externally measurable factors

A node does not need to know where the center of the network is, onlyits next best step the center of the network.

A core can be defined as when two nodes select each other as their nextbest step towards the core. There is nothing special about the core, thetwo nodes that form the core act as any other nodes in the network wouldact.

If there is a tie between a set of directly connected nodes for who waspicked as the ‘Best Neighbour’ for the most destination nodes, thedirectly connected node with the highest ‘tie-breaker’ value (which waspassed during initialization) will be selected as the next best steptowards the core. This mechanism will ensure that there are no loops ina non-trival network (besides Node A->Node B->Node A type loops). Ifthis tie-breaker embodiment is not used, then a random selection can bemade.

This idea of using a nodes ‘next best step to the core’ forms ahierarchy. This hierarchy can be used to push specific node knowledge upthe hierarchy to the top of the tree. The HSPP's (discussed later)exploit this hierarchy to push (or pull) node knowledge up and down thishierarchy.

FIG. 54 is an example of network where each node has selected a directlyconnected node as its next best step to the core. The network is thenrearranged to better show the nature of the hierarchy that is created.As the network topology changes so will the hierarchy that is formed.

Detecting an Isolated Core

An alternative embodiment that helps in the detection of ‘isolatedcores’.

A core is defined as two directly connected nodes that have selectedeach other as the next best to the core. FIG. 27 illustrates and exampleof this.

When a node has chosen a directly connected node as its ‘next best stepto the core’ it will tell that directly connected node of its choice.This allows nodes to detect when they have generated a core that noother nodes are using as their core.

The message that is passed can look like this:

struct sCoreMessage {  bool bIsNextStepToCore; }

If a core is created, both nodes that form the core (in this exampleNode A and Node B) will check to see how many directly connected nodesthey have. If there is more then one directly connected node then theywill examine all the other directly connected nodes.

If the only directly connected node that has chosen this node as itsnext best step to the core is the node that has caused the core to becreated, then this node will select its next best choice to be the nextbest step to the core.

This can help eliminate cores that can block the flow of knowledge tothe real core.

Exemplary Alternative Ways to Select the Next Best Step to the Core

The approach discussed previously involved assigning a credit of ‘1’ toa directly connected node for each destination node that selects thatdirectly connected node as a ‘Best Neighbour’. The node with the highestcount is the next best step to the core (or in the case of an isolatedcore, the second highest count).

If any embodiment uses multiple ‘Best Neighbours’ (such as multipathdiscussed later), then each ‘Best Neighbour’ chosen for each destinationnode could be assigned the appropriate credit. Alternatively, only that‘Best Neighbour’ with the best latency (in the case of multipath) couldbe assigned the credit.

Instead of assigning a credit of one to each directly connected node foreach destination node that selects it as its best choice, other valuescan be used.

For example, log(fCumulativeLinkCost+1)*500 can be the credit assigned.Other metrics could also be used. This metric has the advantage ofgiving more weight to those destination nodes that are further away. Ina dense mesh with similar connections and nodes, this type of metric canhelp better, more centralized cores form.

Another possible embodiment which can be used to extend the idea ofproviding more weighting to destination nodes that are further away isto order all destination nodes by their link costs, and only use the x %(for example, 50%) that are the furthest away to determine the next beststep to the core.

Another embodiment can use a weighting value assigned to each node. Thisweight could be assigned by the node that created the name. For example,if this weighting value was added to the node update structure it wouldlook like this:

struct sNodeKnowledge { Name NameOfTheNode; Float fCumulativeLinkCost;Int nWeight }

The nWeight value (that is in the sNodeKnowledge structure) can be usedto help cores form near more powerful nodes. For example the creditassigned could be multiplied by 10̂nWeight (where 10 is an example).

This will help cores form near the one or two large nodes, even if theyare surrounded by millions of very low power nodes.

The nWeight value should be assigned in a consistent fashion across allnodes in the network. Possible nWeight values for types of nodes:

Equivilant to X low nWeight value Type of Node capacity sensors 0 A verylow capacity sensor 1 or mote 1 A bigger sensor 10 2 A bigger sensorwith more 100 battery life and memory 3 A cell phone 1000 10 A homecomputer 10000000000 15 A core router 1000000000000000 20 A supercomputer with 100000000000000000000 massive connectivity and memory

These weight values are suggestions only, someone skilled in the artwould be able to assign suitable values for their application.

Next Step to the Core in a Network with Asymetric Link Costs

This is an alternative embodiment for choosing the next best step to thecore.

If link is given an asymmetric cost, for example the link L that joinsnodes A and B has a cost of 10 when going from A to B and a cost of 20when going from B to A then an alternative embodiment is useful to helpthe core form in a single location in the network.

In an earlier embodiment the nodes agree on the link cost for aparticular link and used their ‘Best Neighbour’ selection based on thisshared link cost.

If asymmetric link costs are used to determine the ‘Best Neighbour’,then using symmetric link costs can be used to choose the next step tothe core. Using symmetric link costs can help ensure that a coreactually forms.

For each node that a node knows about it will decide which link is itsnext best step to reach that node. It chooses this next best step basedon cumulative link cost, and perhaps a tie-breaker number. This ‘BestNeighbour’ is then given a credit that will be summed with the othercredits assigned to it. The ‘Best Neighbour’ with the most credit willbe picked as the next best step to the core.

In this alternative embodiment a node will agree with the node it islinked to on an alternative cost for the link. This alternative linkcost will be the same for both nodes. This alternative link cost will beused to adjust the cumulative link cost. A choice for ‘Best Neighbour’will be made with this alternative cumulative link cost. This ‘BestNeighbour’ will be assigned the credit that goes towards picking it asthe next best step to the core, even if it was the not ‘Best Neighbour’picked as the next best step to the actual node.

This equation describes how the alternative cumulative link cost can becalculated.

AlternativeCumulativeLinkCost=ActualCumulativeLinkCost+(AlternativeLinkCost−ActualLinkCost)

High Speed Propagation Path(s) (“HSPP”)

Since nodes not at the core of the network will generally not have asmuch memory as nodes at the core, they may forget about a node N thatrelies on them to allow others to connect to node N. If these nodes didforget, no other node in the network would be able to connect to thatnode N.

In the same way, a node that is looking to establish a connection with anode Q faces the same problem. The knowledge of node Q that it islooking for won't reach it fast enough—or maybe not at all if eithernode Q or the node that is trying to connect to it is surrounded by lowcapacity nodes.

An approach is to use the implicit hierarchy created by each nodeschoice as to its ‘next best step to the core’. Node knowledge is pushedup and down this hierarchy to the core. This allows efficient transferof node knowledge to and from the center of the network.

Node knowledge can be pushed/pulled using a methodology referred toherein as a High Speed Propagation Path (“HSPP”). An HSPP can be thoughtof as a marked path/paths between a node and the core. Once that pathhas been set up it is maintained until the node that created it has beenremoved.

There are two types of HSPP's. the first is a notify HSPP. The NotifyHSPP pulls knowledge of a particular node towards the core. Nodes thathave an HSPP running through them are not allowed to forget about thatnode that is associated with the HSPP. All nodes create a Notify HSPP todrive knowledge of themselves towards the core.

A request HSPP is only created when node is looking for knowledge ofanother node. The request HSPP operates in the identical way to thenotify HSPP except that instead of pulling knowledge towards to the coreit pulls knowledge back to the node that created it.

(Persons skilled in the art will appreciate that, in the context of anHSPP, the terms “push” and “pull” are useful for illustrative purposes,but can be viewed as somewhat artificial terms, since in effect, an HSPPimproves the ‘rank’ of a node in a node database so that knowledge ofthat node is sent before the knowledge of other nodes.)

An HSPP travels to the core using each nodes next best step the core.Each nodes ‘next best step to the core’ creates an implicit hierarchy.

An HSPP is not a path for user messages itself, rather it forces nodeson the path to retain knowledge of the node or nodes in question, andsend knowledge of that node or nodes quickly along the HSPP. It alsoraises the priority of updates for the node names associated with theHSPP. This has the effect of sending route update quickly towards thetop of this implicit hierarchy.

The HSPP does not specify where user data messages flow. The HSPP isonly there to guarantee that there is always at least one path to thecore, and to help nodes form an initial connection to each other. Oncean initial connection has been formed, nodes no longer need to use theHSPP.

An HSPP may be referenced as belonging to one node name, or beingassociated with one node name. This in no way limits the number, or typeof nodes that an HSPP may be associated with. In this embodiment thename of the HSPP is the usually the name of the node that the HSPP willbe pushing/pulling to/from the core.

An HSPP is tied to a particular node name or class/group of node names.If a node hosts an HSPP for a particular destination node it willimmediately process and send node knowledge of any nodes that arereferenced by that HSPP.

Node knowledge in the case can be viewed as a sNodeKnowledge update (forexample).

An alternative embodiment could limit that processing to:

-   -   1. Initial knowledge of the destination node    -   2. When the destination node fCumulativeLinkCost goes to        infinity    -   3. When the destination node fCumulativeLinkCost moves from        infinity to some other value

This can ensure that all nodes in the HSPP will always know about thenodes referenced by the HSPP if any one of those nodes can ‘see’ thenode or nodes referenced by the HSPP.

Node knowledge is not contained in the HSPP. The HSPP only sets up apath with a very high priority for knowledge of a particular node ornodes. This means that node updates for those nodes referenced by theHSPP will be immediately sent.

An HSPP is typically considered a bi-direction path.

Alternative embodiments can have two types of types of HSPP's. One typepushes knowledge of a node towards the core. This type of HSPP could becalled a notify HSPP. The second pulls knowledge of a node towards thenode that created the HSPP. This type of HSPP could be called a requestHSPP.

When a node is first connected to the network, it can create an HSPPbased on its node name. This HSPP will push knowledge of this newlycreated node towards the core of the network. The HSPP created by thisnode can be maintained for the life of the node, or for as long as thenode wants to maintain the HSPP. If the node is disconnected or the nodedecides to no longer maintain the HSPP, then it will be removed. Analternative embodiment could have this HSPP be a ‘push HSPP’ instead ofa bi-directional HSPP.

If a node is trying to connect to another node N, it will create an HSPPthat references that node N. This HSPP will travel to the core and helppull back knowledge of node N to the node that created the HSPP andwants to connect to node N. An alternative embodiment could have thisHSPP be a ‘pull HSPP’ instead of a bi-directional HSPP.

If the node no longer wants to maintain an HSPP (perhaps because theconnection to node N is no longer needed) it can send an HSPP updatewith ‘bActive’=false to all the directly connected nodes that it sentthe original HSPP with bActive=true. This should only be done by thenode that has created the HSPP.

Alternative embodiments could allow the request HSPP to be maintained bythe node that generated the request HSPP after the connection has beendropped for some amount of time, in order to facilitate fasterre-connects.

Both types of HSPP will travel to the core. The HSPP that sendsknowledge of a node to the core can be maintained for the life of thenode. The request HSPP will probably be maintained for the life of theconnection.

An alternative embodiment has nodes that create an HSPP send that HSPPto all directly connected nodes, instead of only to their next best stepto the core of the network. This embodiment allows the network to bemore robust while moving and shifting.

An alternative embodiment includes making sure that a node will not senda directly connected node more HSPP's then the maximum nodes requestedby that directly connected node.

An HSPP does not specify where user data should flow, it only helps toestablish a connection (possibly non-optimal) between nodes, or one nodeand the core.

An HSPP will travel to the core even if it encounters node knowledgebefore it reaches the core. Alternative embodiments can have the HSPPstop before it reaches the core.

How an HSPP is Established and Maintained

If a node is told of an HSPP it remembers that HSPP until it is told toforget that about that HSPP, or the connection between it and the nodethat told it of that HSPP is broken.

In an embodiment where a node limits the number of nodes it wants to betold about, that node stores as many HSPP's as are given to it. A nodeshould not send more HSPP's to a directly connected node then themaximum destination node count that directly connected node requested.

In certain systems the amount of memory available on nodes will be suchthat it can assumed that there is enough memory, and that no matter howmany HSPP's pass through a node it will be able to store them all. Thisis even more likely because the number of HSPP's on a node will beroughly related to how close this node is to the core, and a node isusually not close to a core unless it has lots of capacity, andtherefore probably lots of memory.

UR=Ultimate Receiver Node US=Ultimate Sender Node

An HSPP takes the form of:

struct sHSPP { // The name of the node could be replaced with a number// (discussed later). It may also represent a class of nodes // or nodename. sNodeName nnName; // a boolean to tell the node if the HSPP isbeing // activated or removed. bool bActive; // a boolean to decide ifthis a UR (or US generated HSPP) bool bURGenerated; };

In these descriptions an HSPP H is considered to be called HSPP Hregardless of what bActive or bURGenerated (more generally the HSPPType) are. The HSPP H can derive its name from the node name that itrepresents.

An alternative embodiment might be where the name of HSPP H is notlinked to the node name (or names) it references. The HSPP structure inthis embodiment might look like this (for example):

struct sAlternateHSPP { // a unique name to represent this HSPPsHSPPName HSPPName; // The name of the node could be replaced with anumber // (discussed later). It may also represent a class of nodes //or node name. sNodeName nnName; // a boolean to tell the node if theHSPP is being // activated or removed. bool bActive; // a boolean todecide if this a UR (or US generated HSPP) bool bURGenerated; };

The following description uses sHSPP (as oppose to sAlternateHSPP) inorder to describe how HSPP's work. This should not be seen as limitingthe generality of the method.

A UR generated HSPP can also be called a ‘Notify HSPP’ and a USgenerated HSPP can also be called a ‘Request HSPP’.

It is important that the HSPP does not loop back on itself, even if theHSPP's path is changed or broken. This should be guaranteed by theprocess in which the next step to the core of the network is generated.

A node should never send an active HSPP H (bActive=true) to a node thathas sent it an active HSPP H.

A node will record the number of directly connected nodes that tell itto maintain the HSPP H (bActive is set to true in the structure). Ifthis count drops to zero it will tell its directly connected nodes thatwere sent an active HSPP H (bActive=true), an inactive HSPP H(bActive=false).

At a broad level the HSPP finds a non-looping path to the core, and whenit reaches the core it stops spreading. It does this because the twonodes that form the core will select each other as their next best steptowards the core. And since an active HSPP will not be sent to a nodethat has already sent it an active HSPP the HSPP will only be sent byone node of the two node core.

If the HSPP path is cut, the HSPP from the cut to the core will beremoved. It will be removed because the only node that told it an activeHSPP will be removed. This will prompt the node on the core side of thecut to tell those nodes that it told an active HSPP H an inactive HSPPH. In most cases this process will cascade towards the core removingthat active HSPP.

An HSPP will travel to the core using each nodes next best step to thecore.

The purpose of the HSPP generated by the UR is to maintain a pathbetween it and the core at all times, so that all nodes in the systemcan find it by sending a US generated HSPP (a request HSPP) to the core.

If a node N receives an active HSPP H from any of its directly connectednodes, it will send on an active HSPP H to the node (or nodes) selectedas its next best step to the core, assuming that that node (or nodes)that has been selected as its next best step to the core has not sent itan active HSPP H.

If multiple active HSPPs H arrive at the same node, that node will sendon an HSPP with bURGenerated marked as true, if any of the incomingHSPP's have their bURGenerated marked as true.

If the directly connected node that was selected as the next best stepto the core changes from node A to node B, then all the HSPP's that weresent to node A will be sent to node B instead (assuming that the next‘best step to the core’ has not sent this node an active HSPP of thesame name already). Those HSPP's that were sent to node A will have anHSPP update sent to node A with their ‘bActive’ values set to false, andthe ones sent to node B will have their ‘bActive’ values set to true.

An alternative embodiment is if node A creates an HSPP it should sendthe HSPP to all directly connected nodes. This ensures that even if thisnode is moving rapidly, that knowledge is always driven to or from thecore.

When a node A establishes a connection to another node B, Node A can usean HSPP to pull route information for node B to itself (called a requestHSPP). This HSPP should also be sent to all directly connected nodes.

An alternative embodiment has only one type of HSPP that moves databi-directionally. This type of HSPP would be able to replace both apush/notify HSPP and a pull/request HSPP. In this embodiment thatbURGenerated parameter is omitted.

An HSPP does not need to be continually resent. Once an HSPP has beenestablished in a static network, no addition HSPP messages need to besent. This will be apparent to someone skilled in the art.

Each node remembers which directly connected nodes have told it aboutwhich HSPP's, a node also typically remembers which HSPPs it has told todirectly connected nodes.

Alternative HSPP Types

This alternative embodiment can help maintain connection in a lowbandwidth environment.

In the previous embodiment there are two types of HSPP:

1. Notify HSPP

2. Request HSPP

This embodiment will introduce a new type of HSPP called a ‘PriorityNotify HSPP’.

The ‘Priority Notify HSPP’ is the same as ‘notify HSPP’ except that itwill be sent before all ‘Notify HSPP's’. This will be discussed later.

For example, if a node is attempting to communicate with another node,or is aware that another node is attempting to communicate with it, thenit can change its notify HSPP's into ‘Priority Notify HSPP's’.

The following table describes what type of HSPP a node will send to itsnext best step to the core, given the types of HSPP's it receives for aparticular destination node.

HSPP's In HSPP Out Request HSPP Request HSPP Notify HSPP Notify HSPPPriority HSPP Priority Notify HSPP Request HSPP + Notify HSPP NotifyHSPP Request HSPP + Priority Notify HSPP Priority Notify HSPP RequestHSPP + Priority Notify HSPP Notify HSPP + Priority Notify HSPP+ NotifyHSPP + Priority Notify HSPP Priority Notify HSPP

If the entries that contain ‘Priority Notify HSPP’ are ignored, thiswill also describe how the other embodiment decides which HSPP type tosend to its next best step to the core.

The HSPP structure might be amended to look like this:

struct sHSPP { // The name of the node could be replaced with a number// (discussed previously) sNodeName nnName; // a boolean to tell thenode if the HSPP is being // activated or removed. bool bActive; // theHSPP Type (for ex: Request HSPP, Notify HSPP, // Priority Notify HSPP)int nHSPPType; };

Ordering HSPP's to be Sent

An alternative embodiment adjusts the order that HSPP's are sent.

When a node receives an HSPP it will need to order it before sending.This will ensure that a more important HSPP's are sent first.

The order that HSPP's should be sent if the ‘Priority Notify HSPP’embodiment is not used is:

1. Request HSPP

2. Notify HSPP

If the ‘Priority Notify HSPP’ embodiment is used then the order is this:

1. Request HSPP and Priority Notify HSPP

2. Notify HSPP

Removing Simple Loops

This alternative embodiment can be used to stop simple loops fromforming.

Someone skilled in the art will recognize the variations on the ‘poisonreverse’.

A node A that has picked node B as a ‘Best Neighbour’ for messages goingto node N then node A will tell node B that it has been picked.

For example, node A could send node B a message that looks like this:

struct sIsBestNeighbour { Name NodeName; Boolean bIsBestNeighbour; }

If node A has told node B that it is the ‘Best Neighbour’ for messagesgoing to node N then node B will be unable to pick node A as the ‘BestNeighbour’ for messages going to node N. If the only possible choicenode B has for messages going to node N is node A then B will select no‘Best Neighbour’ and set its cumulative link cost to node N to infinity.

Marking Nodes as in the Data Stream

This alternative embodiment can be used to mark those nodes that are inthe data stream.

In this embodiment a node is only considered as ‘in the data stream’ ifit is marked as ‘in the data stream’. A node may forward payload packetswithout being marked in the data stream. If a node is forwarding payloadpackets, but is not marked in the data stream it is not considered as‘in the data stream’.

If a node A has attempted to establish a data connection to another nodeN in the network it will tell the node B that it has selected as its‘Best Neighbour’ to node N that node B is a ‘Best Neighbour’ for node Nand it is in the data stream for node N.

If a node B has been told that it is in the data stream by a directlyconnected node that has told B that it is a ‘Best Neighbour’ then node Bwill tell the directly connected node C that it has selected as a ‘BestNeighbour’ for node N that node C is in the data stream.

As an example the structure of this message might look like this:

struct sInTheDataStream { Name NodeName; Boolean bIsInTheDataStream; }

If node B was marked as in the data stream for messages going to node Nit would tell the node that it has selected as its next best step tonode N that it is in the data stream. If node B is no longer marked asin the data stream because:

-   -   1. The directly connected node (or nodes) that had told node B        that it was in the data stream disconnected.    -   2. The directly connected node (or nodes) that told node B that        it was in the data stream all told node B that it was no longer        in the data stream.

Then node B will tell its ‘Best Neighbour’ C that it is no longer in thedata stream.

A node is only marked as being the data stream by this flag. A node mayforward message packets without being marked as in the data stream.

Link Cost from Stream

The term ‘link cost from stream’ is sometimes referred to herein as ‘hopcost from flow’.

This alternative embodiment can be used to order the node updates in anetwork. This ordering allows the network to become much more efficientby sending updates to maintain and converge data flows before otherupdates.

The sNodeKnowledge structure used to pass node knowledge around might bemodified to look like this: (for example)

struct sNodeKnowledge { Name NameOfTheNode; Float fCumulativeLinkCost;Float fCumulativeLinkCostFromStream; }

The fCumulativeLinkCostFromStream is incremented in the same way as thefCumulativeLinkCost. However, if a node is in the data stream for aparticular node it will reset the fCumulativeLinkCostFromStream to 0before sending the update to its directly connected nodes.

Just as the fCumulativeLinkCost is initialized to zero thefCumulativeLinkCostFromStream is also initialized to zero.

An alternative embodiment could have the fCumulativeLinkCostFromStreamreset for other reasons as well such as user data messages being passedthrough that node. Someone skilled in the art will recognize suchvariations.

An alternative embodiment to help in low bandwidth environments is tohave nodes set their fCumulativeLinkCostFromStream to a non-zero value(for example 50) if they are not exchanging user data with another node.If they are in communication with another node they would set theirfCumulativeLinkCostFromStream to 0. An alternative embodiment could alsoset a non-zero fCumulativeLinkCostFromStream to a multiple of the min,max, average (etc) of the link costs associated with the links that thisnode has established.

If the fCumulativeLinkCost goes to infinity, then keep the lastnon-infinity fCumulativeLinkCostFromStream value. This will be used toorder when to send the infinity update to directly connected nodes.

Alternative Link Cost from Stream

This embodiment is similar to the previous embodiment, except that it ismore useful in helping the network remove node knowledge.

In the previous embodiment the fCumulativeLinkCostFromStream got resetto zero when it came across a node that was marked as in the datastream. This embodiment changes what type of update will be sent.

If a node A that created the destination node E (this could also bedescribed as a node A that created a node name E for use by node A) istold by a directly connected node B that it is in the data stream fornode E then node A will tell that directly connected node B a nodeupdate for node E where thefCumulativeLinkCostFromStream=fCumulativeLinkCost. In most cases thiswill have both these values set to zero since node A has created thename E.

All other directly connected nodes that have not told node A that it isin the data stream will be told a fCumulativeLinkCostFromStream!=fCumulativeLinkCost. For example:

fCumulativeLinkCostFromStream=fCumulativeLinkCost+0.1f;

Since fCumulativeLinkCost is usually zero these directly connect nodeswould be told a fCumulativeLinkCostFromStream of 0.1 and afCumulativeLinkCost of 0. 0.1 should be viewed as exemplar only.

If a node that is not the node that created the destination node name(in this example it would be any node except node A) is marked as ‘inthe data stream’ and has afCumulativeLinkCostFromStream=fCumulativeLinkCost then it will tell allits directly connect nodes that have not marked it in the data stream anupdate for node E with the) fCumulativeLinkCostFromStream=0. For thosenodes that have marked it as in the data stream it will tell them anupdate for node E withfCumulativeLinkCostFromStream=fCumulativeLinkCost.

At no point in this embodiment is the fCumulativeLinkCost adjusted tomatch fCumulativeLinkCostFromStream. The fCumulativeLinkCostFromStreamis always adjusted relative to the fCumulativeLinkCost.

Ordering of Node Updates

In a large network there can be a lot of node updates to send. Thisalternative embodiment allows updates be ordered by how important theyare.

This alternative embodiment assumes that fCumulativeLinkCostFromStreamis used and that HSPP's are used. If only one of them are used then justignore the ordering that un-used embodiment would provide.

All nodes in the system are ordered by the fCumulativeLinkCostFromStreamvalue that was sent to it by the selected ‘Best Neighbour’ (the linkcost for the connection was added to the value sent by the directlyconnected node). This ordered list could take the form of a TreeMap (inthe example of Java).

If the previous embodiment is used then whenfCumulativeLinkCost=fCumulativeLinkCostFromStream and a node is markedin the data stream then it should be added to the treemap as if it had afCumulativeLinkCostFromStream of 0.

When an update to a destination node route needs to be sent to adirectly connected node this destination node is placed in a TreeMapthat is maintained for each directly connected node. The TreeMap is adata structure that allows items to be removed from in by ascending keyorder. This allows more important updates to be sent to the directlyconnected node before less important updates.

The destination nodes placed in this TreeMap are ordered by theirfCumulativeLinkCostFromStream values, except in the case where:

-   -   1. This node is in the path of an HSPP for this destination        node, and this directly connected node is:        -   a. In the path to the core and the HSPP is a notify HSPP or            if there is only one type of HSPP        -   b. One of the nodes that told us of this HSPP and the HSPP            is a request HSPP or there is only one type of HSPP.    -   2. This node is in the data stream for this destination node.        (bIsInTheDataStream=true and        fCumulativeLinkCost=fCumulativeLinkCostFromStream)

If the destination node belongs to one of these two groups, the item isplaced at the start of the ordered update list maintained for eachdirectly connected node. Exemplar pseudo code for this process lookslike this:

float fTempCumLinkCostFStream = GetCumLinkCostFStream (NodeToUpdate); if(NodeToUpdate for this connection belongs to group 1 or 2)fTempCumLinkCostFStream = 0; while(CurrrentConnection.OrderedUpdateTreeMap containsfTempCumLinkCostFStream as a key) { IncrementfTempCumLinkCostFStream bya small amount } Add pair (fTempCumLinkCostFStream, NodeToUpdate) toCurrrentConnection.OrderedUpdateTreeMap;

The destination node route updates are then sent in this order. When adestination update has been processed it is removed from this orderedlist (CurrentConnection.OrderedUpdateTreeMap) This ordering insures thatmore important updates are sent before less important updates.

The fTempCumLinkCostFStream should also be used on a per connectionbasis to determine which destination node updates should be sent. Forexample, if there are five destination nodes withfTempCumLinkCostFStream values of:

1.202 dest node D 1.341 dest node F 3.981 dest node G 8.192 dest node B9.084 dest node M

And the directly connected node has requested a maximum of fourdestination node routes sent to it, This node will only send the firstfour in this list (the node will not send the update for destinationnode M).

If destination node G has its fTempCumLinkCostF Stream change from 3.981to 12.231 the new list would look like this:

1.202 dest node D 1.341 dest node F 8.192 dest node B 9.084 dest node M12.231 dest node G

In response this update this node would schedule an update for bothdestination node G and destination node M. The ordered pairs inCurrentConnection.OrderedUpdateTreeMap (assuming no HSPP's or DataStreams) would looks this:

Position 1 (9.084, M) Position 2 (12.231, G)

This node would then send an infinity update for node G. It would thenschedule a delayed send for destination node M. (See ‘Delayed Sending’)

An infinity destination node update makes sure that the messages neededto pass this information is sent for the node that is getting aninfinity update. This example includes several different embodiments,for those that are not used someone skilled in the art will be able toomit the relevant item(s).

a. fCumulativeLinkCost = INFINITY b. fCumulativeLinkCostFromStream =INFINITY c. bIsBestNeighbour = FALSE d. bIsInDataStream = FALSE

When the update for destination node M is sent, it would benon-infinity.

Delayed Sending

This alternative embodiment helps node knowledge to be removed from thenetwork when a node is removed from the network.

If node knowledge is not removed from the network, then a properhierarchy and core will have trouble forming.

If this is the first time that a destination node update is being sentto a directly connected node, or the last update that was sent to thisdirectly connected node had a fCumulativeLinkCost of infinity, theupdate should be delayed.

For example, if the connection has a latency of 10 ms, the update shouldbe delayed by (Latency+1)*2 ms, or in this example 22 ms. This latencyshould also exceed a multiple of the delay between control packetupdates (see ‘Propagation Priorities)

Someone skilled in the art will be able to experiment and find gooddelay values for their application.

The exception is if either of these conditions are met:

-   -   1. This node is in the path of an HSPP for this destination        node, and this directly connected node is:        -   a. In the path to the core and the HSPP is a notify HSPP or            there is only one type of HSPP.        -   b. One of the nodes that told us of this HSPP and the HSPP            is a request HSPP or there is only one type of HSPP.    -   2. This node is in the data stream for this destination node.        (bIsInTheDataStream=true and        fCumulativeLinkCost=fCumulativeLinkCostFromStream)

If an infinity update has been scheduled to be sent (by having it placedin the CurrentConnection.OrderedUpdateTreeMap), but has not been sent bythe time a non-infinity update is scheduled to be sent (because it hasbeen delayed), the infinity must be sent first, and then a non-infinityupdate should be delayed again before being sent.

Cycling a Destination Node from Infinity to Non-Infinity

This alternative embodiment helps node knowledge to be removed from thenetwork when a node is removed from the network.

If any of these criteria are met for a node A update being sent to adirectly connected node N:

-   -   1. If the alternative embodiment that limits the number of node        updates that can be sent to a node is used:        -   If the directly connected node N has been sent a            non-infinity update for this destination node A, however the            new fTempCumLinkCostFStream value (see above) for node A is            greater then X other nodes'        -   fTempCumLinkCostFStream value. Where X is the maximum number            of nodes that the directly connected node N requested to be            told about.    -   2. If the fTempCumLinkCostFStream for node A becomes the larger        that any other nodes' fTempCumLinkCostFStream that was sent to        this directly connected node N.

This node will send the directly connected node N an update of infinityfor this destination node A. Then after a suitable delay (See DelayedSending) this node will send a non-infinity update for this destinationnode to the directly connected node, except in the case where this nodeA still meets criteria I.

This is part of the approach uses to help remove bad route data from thenetwork, and automatically remove loops.

An infinity update is an update with the fCumulativeLinkCost value setto infinity (see above for a more complete definition).

The decision to send an infinity update (followed some time later with anon-infinity update for same destination node) when a destination nodemeets the previous criteria is a recommended approach. Alternativeapproaches to trigger the infinity update followed by the delayednon-infinity update are (but not limited to):

-   -   1. When the fTempCumLinkCostFStream increases by a certain        percent, or amount in a specific period of time. For example, if        the fTempCumLinkCostFStream increased by more then 10 times the        connection cost in under 0.5 s.    -   2. When the position of this destination node in the ordered        list moves more then (for example) 100 positions in the list, or        moves more then (for example) 10% of the list in X seconds.

Persons skilled in the art can determine a suitable increase infTempCumLinkCostFStream and suitable timing values in order to triggerthe infinity/non-infinity send.

An alternative embodiment could use fCumulativeLinkCostFromStreaminstead of fTempCumLinkCostFStream.

If a previously unknown destination node appears at the top of the list,the infinity does not need to be sent because the directly connectednodes have not been told a non-infinity update before. However, tellingthe directly connected nodes about this destination node should bedelayed.

This delayed sending does not need to occur if either on theseconditions is met:

-   -   1. This node is in the path of an HSPP for this destination        node, and this directly connected node is:        -   c. In the path to the core and the HSPP is a notify HSPP or            there is only one type of HSPP.        -   d. One of the nodes that told us of this HSPP and the HSPP            is a request HSPP or there is only one type of HSPP.    -   2. This node is in the data stream for this destination node.        (bIsInTheDataStream=true and        fCumulativeLinkCost=fCumulativeLinkCostFromStream)

End User Software

This network system and method can be used to emulate most other networkprotocols, or as a base for an entirely new network protocol.

It can also serve as a replacement for the ‘routing brains’ of otherprotocols.

In this document TCP/IP will be used as an example of a protocol thatcan be emulated. The use of TCP/IP as an example is not meant to limitthe application of this invention to TCP/IP.

In TCP/IP when a node is turned on, it does not announce its presence tothe network. It does not need to because the name of the node (IPaddress) determines its location. In the present invention, the nodeneeds the network to know that it exists, and provide the network with aguaranteed path to itself. This is discussed in much greater detailelsewhere.

When end user software (“EUS”) wishes to establish a connection, itcould do so in a manner very similar to TCP/IP. In TCP/IP the connectioncode looks similar to this:

SOCKET sNewSocket=Connect(IP Address, port);

With this invention, the ‘IP Address’ is replaced with a Globally UniqueIdentifier (“GUID”).

SOCKET sNewSocket=Connect(GUID,port).

In fact, if the IP Address can be guaranteed to be unique, then the IPaddress could serve as the GUID, providing a seamless replacement of anexisting TCP/IP network stack with this new network invention.

One way to guarantee a unique IP is to have each node create a randomGUID and then use that to communicate with a DHCP (Dynamic HostConfiguration Protocol) like server to request a unique IP address thatcan be used as a GUID. The node would then discard its first GUID nameand use only this IP address as a GUID. Using IP addresses in thiscontext would mean that IP addresses would not necessarily need toreflect a nodes position in the network hierarchy.

Once a connection to a destination node has been requested, the networkwill determine a route to the destination (if such a route exists), andcontinually improve the route until an optimal route has been found.

The receiving end will look identical to TCP/IP, except a request todetermine the IP address of the connecting node will yield a GUIDinstead. (or an IP address is those are being used as GUIDs).

This approach provides the routing through the network, someone skilledin the art could see how different flow control approaches might workbetter in different networks. For example, a wireless network might needan approach that does not lose packets when incoming data rates exceedoutgoing connection rates.

FIG. 28 is an example of where this routing method would fit into theTCP/IP example.

Persons skilled in the art will appreciate that this new routingapproach allows a TCP/IP like interface for end user applications. Thisis an example not meant to limit this routing approach to any particularinterface (TCP/IP for example) or application.

Connecting Two Nodes Across the Network

The following is an example of one approach that can be used to connecttwo nodes in this network. This example (like all examples in thisdocument) is not meant to limit the scope of the patent. Someone skilledin the art would be aware of many variations.

If the alternative embodiment that uses HSPP's is not used then ignorethe parts about HSPP's.

If node A wishes to establish a connection with node B, it will firstsend out a request HSPP (discussed earlier) to all directly connectednodes. This request HSPP will draw and maintain route information aboutnode B to node A. This request HSPP will be sent out even if node Aalready has knowledge of node B.

If the alternative embodiment that uses ‘priority notify hspp’ is usedthen node A can change its notify hspp to a priority notify hspp andinform all its directly connected nodes. This can help connectivity inlow bandwidth mobile environments since it would allow nodes that arecommunicating to have their information spread before those nodes thatare not communicating.

This HSPP will travel to the core even if it encounters node routeknowledge before reaching the core.

Once Node A has a non-infinity, next best step to node B it will sendout a ‘connection request message’ to the specified port on node B. Thisrequest will be sent to the directly connected node that has beenselected as the ‘Best Neighbour’ for messages going to node B.

If the ‘marking the data stream’ embodiment is used then node A willtell its directly connect node that it has selected as a ‘BestNeighbour’ that it is in the data stream for node B.

The use of ports is for example only and is not meant to limit the scopeof this invention. A possible alternative could be a new node namespecifically for incoming connections. Someone skilled in the art wouldbe aware of variations.

It will keep sending this message every X seconds (for example 15seconds), until a sConnectionAccept message has been received, or atimeout has been reached without reception (for example 120 seconds).The connection request message might contain the GUID of node A, andwhat port to send the connection reply message to. It may also contain anUniqueRequestID that is used to allow node B to detect and ignoreduplicate requests from node A.

The connection request message looks like this (for example):

struct sConnectionRequest { // the name of node A, could be replacedwith a number // for reduced overhead. sNodeName nnNameA; // Which porton node A to reply to int nSystemDataPort; // Which port to send enduser messages to on node A int nUserDataPort; // a unique request idthat node B can use to // decide which duplicate requests to ignore intnUniqueRequestID; }

When node B receives the ‘connection request’ message from node A itwill generate a request HSPP for node A and send it to all directlyconnected nodes. This will draw and maintain route information aboutnode A to node B.

If the alternative embodiment that uses ‘priority notify hspp’ is usedthen node B can change its notify HSPP to a priority notify HSPP andinform all its directly connected nodes. This can help connectivity inlow bandwidth mobile environments.

If the alternative embodiment ‘in the data stream’ is used then Node Bwill wait until it sees where its next best step to node A is, and thenmark the route to node A as ‘In the data stream’.

Node B will then send a sConnectionAccept message to node A on the portspecified (sConnectionRequest.nSystemDataPort). This message looks likethis:

struct sConnectionAccept { // the name of node B sNodeName nnNameB; //the port for user data on node B int nUserDataPortB; // the uniquerequest ID provided by A in the // sConnectionRequest message intnUniqueRequestID; }

The sConnectionAccept message will be sent until node A sends asConnectionConfirmed message that is received by node B, or a timeoutoccurs.

The sConnectionConfirmed message looks like this:

struct sConnectionConfirmed { // the name of node A, could be replacedwith a number // for reduced overhead. sNodeName nnNameA; // the uniquerequest ID provided by A in the // sConnectionRequest message intnUniqueRequestID; }

If a timeout occurs during the process the connection is deemed to havefailed and will be dismantled. The request HSPP's that both nodes havegenerated will be removed, and the ‘in the data stream’ flag(s) will beremoved (if they were added).

Once the connection is established, both nodes may send user datamessages to each others respective ports. These messages would then berouted to the end user software via sockets (in the case of TCP/IP).

An alternative embodiment would not require a connection to beestablished, just the sending of EUS messages/payload packets when routeto the destination node was located.

Node Name Optimization and Messages

This alternative embodiment can be used to optimize messages and namepassing.

Every node update and EUS message/payload packet needs to have a way toidentify which destination node they reference. Node names and GUIDSscan easily be long, and inefficient to send with every message and nodeupdate. Nodes can make these sends more efficient by using numbers torepresent long names.

For example, if node A wants to tell node B about a destination nodenamed ‘THISISALONGNODENAME.GUID’, it could first tell node B that:

1=‘THISISALONGNODENAME.GUID’

A structure for this could look like (for example):

struct sCreateQNameMapping { // size of the name for the node intnNameSize; //name of the node char cNodeName[Size]; // the number thatwill represent this node name int nMappedNumber; };

Then instead of sending the long node name each time it wants to send adestination node update, or message—it can send a number that representsthat node name (sCreateQNameMapping.nMappedNumber). When node A decidesit no longer wants to tell node B about the destination node called‘THISISALONGNODENAME.GUID’, it could tell B to forget about the mapping.

That structure would look like:

struct sRemoveQNameMapping { int nMappedNumber; };

Each node would maintain its own internal mapping of what names mappedto which numbers. It would also keep a translation table so that itcould convert a name from a directly connected node to its own namingscheme. For example, a node A might use:

1=‘THISISALONGNODENAME.GUID’

And node B would use:

632=‘THISISALONGNODENAME.GUID’

Thus node B, would have a mapping that would allow it to convert nodeA's numbering scheme to a numbering scheme that makes sense for node B.In this example it would be:

Node A Node B 1 632 . . . . . . . . . . . .

Using this numbering scheme also allows messages to be easily tagged asto which destination node they are destined for. For example, if thesystem had a message of 100 bytes, it would reserve the first four bytesto store the destination node name the message is being sent to,followed by the message. This would make the total message size 104bytes. An example of this structure also includes the size of themessage:

struct sMessage { // the number that maps to the name of the node where// this message is being sent to int uiNodeID; // the size of thepayload packet int uiMsgSize; // the actual payload packet charcMsg[uiMsgSize]; }

When this message is received by a node, that node would refer to itstranslation table and convert the destination mapping number to its ownmapping number. It can then use this mapping number to decide if thisnode is the destination for the payload packet, or if it needs to sendthis payload packet to another directly connected node.

These quick destination numbers could be placed in a TCP/IP header bysomeone skilled in the art.

When to Remove Name Mappings

This alternative embodiment is used to help remove name mappings thatare no longer needed.

If a destination node has a fCumulativeLinkCost of infinity continuouslyfor more then X ms (for example 5000 ms) and it has sent this update toall directly connected nodes, then this node will remove knowledge ofthis destination node.

First it will release all the memory associated with this destinationnode, and the updates that were provided to it by the directly connectednode. It will also remove any messages that this node has waiting tosend to that destination node.

Next it will tell its directly connected nodes to forget thenumber->name mapping for this destination node.

Once all of the directly connected nodes tell this node that it too canforget about their number->name mappings for this destination node, thenthis node can remove its own number->name mapping.

At this stage there is no longer any memory associated with thisdestination node.

A node should attempt to reuse forgotten internal node numbers beforeusing new numbers.

Simpler Fast Routing

This alternative embodiment is used to speed up the routing of packets.

An optimization would be add another column to the name mapping tableindicating which directly connected node will be receiving the message:

Thus node B, would have a mapping that would allow it to convert nodeA's numbering scheme to a numbering scheme that makes sense for node B.In this example it would be:

Directly Connected Node Node A Node B Message is Being Sent To 1 632 7 .. . . . . . . . . . .

This allows the entire routing process to be one array lookup. If node Asent a message to node B with a destination node 1, the routing processwould look like this:

-   -   1. Node B create a pointer to the mapping in question:        sMapping*pMap=&NodeMapping[pMessage->uiNodeID];    -   2. Node B will now convert the name:        pMessage->uiNodeID=pMap->uiNodeBName;    -   3. And then route the message to the specified directly        connected node:        RouteMessage(pMessage,pMap->uiDirectlyConnectedNodeID);

For this scheme to work correctly, if a node decides to change whichdirectly connected node it will route messages to a directly connectednode, it will need to update these routing tables for all directlyconnected nodes.

If the directly connected nodes reuse internal node numbers, and thenumber of destination nodes that these nodes know about are less theamount of memory available for storing these node numbers. Then the nodecan use array lookups for sending messages.

This will provide the node with O(1) message routing (see above). If thenode numbers provided by the directly connection nodes exceed size ofmemory available for the lookup arrays (but the total node count stillfits in memory), the node could shift from using an array lookup tousing a hashmap lookup.

More Complex Fast Routing

This alternative embodiment helps ensure that O(1) routing can be usedand avoids the use of a hash map.

In order to ensure that nodes can always perform the fast O(1) arraylookup routing, a node could provide each directly connected node with aunique node number-name mappings. This will ensure that the directlyconnected node won't need to resort to using a hash table to performmessage routing (see above)

When generating these unique number->name mappings for the directlyconnected node, this node would make sure to re-use all numberspossible.

By reusing these numbers, it ensures that the highest number used in themappings should never greatly exceed the maximum number of destinationnode updates requested by that directly connected node.

For each connection the node will need to create an array of integers,where the offset corresponds to the nodes own internal node ID, and thenumber stored at that offset is the unique number->name mapping used tofor that directly connected node.

The fast routing would then look like this (see above):

-   -   1. Node B create a pointer to the mapping in question:        sMapping*pMap=&NodeMapping[pMessage->uiNodeID];    -   2. Node B will now convert the name:        pMessage->uiNodeID=pMap->uiNodeBName;    -   3. And then route the message to the specified directly        connected node:        RouteMessage(pMessage,pMap->uiDirectlyConnectedNodeID;    -   4. Before sending, the name will get changed one final time        pMessage->uiNodeID=UniqueNameMapping[uiConnectionID][pMessage->uiNodeID];

Where UniqueNameMapping is a two dimensional array with the firstparameter being the connection ID, and the second is the number used inthe unique number->name mapping for the connection with that connectionID.

Reusing the numbers used in the number->name mappings for each directlyconnected node will require an array that is the same size as themaximum number of mappings that will be used. The array will be treatedas a stack with the numbers to be reused being placed in this stack. Anoffset into the stack will tell this node where to place the next numberto be reused and where retrieve numbers to be re-used.

If the directly connected node has requested a maximum number ofdestination nodes that is greater then the total number of destinationnodes known about by this node, then a unique mapping scheme is notneeded for that directly connected node.

If this circumstance changes, one can be easily generated by someoneskilled in the art.

When to Send User Data Packets

A user data packet can be sent whenever there is a route available.Alternative embodiments could allow for QOS where certain classes ofnodes had their user data packets sent first. Someone skilled in the artwould be aware of variations.

If no route is immediately available a payload packet could be held forsome amount of time in hopes that a valid route would appear.

A Time-To-Live (TTL) scheme may also be implemented by someone skilledin the art. Instead of using hops (such as a protocol like TCP/IP) theTTL might be a multiple of the fCumulativeLinkCost value for thedestination node that is calculated by the node that creates the payloadpackets. Each node will then subtract its LinkCost for the link that thepacket is received on from the TTL. If the TTL goes below zero thepacket could be removed. Someone skilled in the art will recognize thisas a standard TTL scheme with the use of link costs instead of hopcounts (like in TCP/IP).

Allowing more Important Nodes to Spread Further

This alternative embodiment will allow some nodes to spread furtherand/or faster through network.

For this embodiment to work well, most (if not all) nodes in the networkwill need to follow the same rules.

For a class of nodes that is marked as ‘more important’, only a fractionof the link cost will be added to their fCumulativeLinkCost and/orfCumulativeLinkCostFromStream values. For example, if a node D was inthe class of more important nodes, and the usual link cost for a link Nwas 10, then the update for node D would only have (for example) 5, orhalf of the link cost added to its fCumulativeLinkCost and/orfCumulativeLinkCostFromStream.

How important a node is might be linked to:

-   -   1. Its magnitude (discussed earlier)    -   2. an arbitrary scheme based on the name of the node.    -   3. another value or combination of values added to or present in        the node update structure.

Congested or Energy Depleted Nodes

This alternative embodiment can help shift network traffic away fromoverly congested nodes, or nodes that are running low on battery.

If a node was running low on energy, or was experiencing congestion itcould increase its link costs. This would help shift traffic away fromthis node was experiencing problems.

It is helpful that the node shifts its link values slowly and waitsbetween changes. This will help avoid unstable network oscillations.

If a node experiences reduced congestion or its battery situationimproves then it should slowly lower its link costs back to normal.

Someone skilled in the art will be aware of the oscillation problems andbe aware of schemes to deal with these problems.

Nodes Sharing a Name

This alternative embodiment allows nodes to share a name.

In the case of a web server (for example), it can become important toprovide more bandwidth and connectivity then a single server canprovide.

If two or more nodes were to use the same name then nodes attempting toconnect to a node with that name would connect to the closest node(based on fCumulativeLinkCost).

If the request was stateless (for example requesting the main page of aweb site) a node could then send the request immediately, since nomatter which node the request got routed to, the same result would bereturned.

If the node required a state-full connection, then it would initiallyconnect to the closest node with that name. That closest node would thenreturn its unique name that could be used to establish a connection thatneeded state.

For example, in the sConnectionAccept struct discussed earlier the nameof the node returned (sConnectionAccept.nnNameB) could be the uniquename of the node.

Propagation Priorities

In a larger network, bandwidth throttling for control messages will needto be used.

Total ‘control’ bandwidth should be limited to a percent of the maximumbandwidth available for all data.

For example, we may specify 5% of maximum bandwidth for each group, witha minimum size of 4K. In a simple 10 MB/s connection this would meanthat we'd send a 4K packet of information every:

 = 4096/(10  MB/s * 0.05) = 0.0819  s

So in this connection we'd be able to send a control packet every 0.0819s, or approximately 12 times every second.

The percentages and sizes of blocks to send are examples, and can bechanged by someone skilled in the art to better meet the requirements oftheir application.

Bandwidth Throttled Messages

These messages should be concatenated together to fit into the size ofblock control messages fit into.

If a control message references a destination node name by itsquick-reference number, and the directly connected node does not knowthat number, then quick reference (number->name mapping) should precedethe message.

There should be a split between the amount of control bandwidthallocated to route updates and the amount of control bandwidth allocatedto HSPP updates.

For example, 75% of the control bandwidth could be allocated to routeupdates and the remaining 25% could be allocated to HSPP updates.Someone skilled in the art could modify these numbers to better suittheir implementation.

Multiple Path Networks

This section of the document describes an embodiment that allowsmultiple paths for end user data to form between two communicatingnodes. This embodiment also allows for paths to move and shift to avoidcongestion.

This embodiment uses the idea of nodes and queues. Queues are used asdestinations for messages (in the way that node names were used in theprevious section of the document). The terminology in this section ofthe document may be slightly different from above, however someoneskilled in the art will be able to tell which terms are equivalent.

Any definitions or concepts provided in this section of the documentshould not be seen as invalidating or changing the meaning ofdefinitions or concepts in the preceding part of this document.

Someone skilled in the art would be aware of variations that would bepossible.

This network does not rely on any agent possessing global knowledge ofthe network.

The constituents of the network are nodes and queues.

This network holds to several principles:

General Principles

-   -   1. The network will use simple concepts.    -   2. Decision-making and knowledge will be kept local, avoiding        the need for global knowledge.    -   3. There shall be no limits on system size or topography, nodal        capacity, or structural flexibility.

These principles govern the design of the network. The operation ofthese principles is explained in detail later.

Particular Principles

-   -   1. Anode will only send messages to directly connected nodes        that it has specified as chosen destinations.    -   2. A node will only send a message to a chosen destination if        the latency of data in the queue on that node is greater than        the latency of that chosen destination minus the minimum latency        of all chosen destinations.    -   3. Nodes not currently in the data stream have only one chosen        destination. Nodes in the data stream can have multiple chosen        destinations.    -   4. When looking for a better chosen destination, nodes not in        the data stream use passive loop checking, while nodes in the        data stream use active loop checking.    -   5. Connections are established and maintained in a TCP/IP        manner.    -   6. Nodes in large networks look for knowledge in the core of the        network.

Data Flow Principles

-   -   1. A stream of data must not cause its own path latencies to        change, except in the case where the flow is past capacity.

It is to be reiterated that examples are given herein in order toclarify understanding. These examples, when making specific reference tonumbers, other parties' software or other specifics, are not meant tolimit the generality of the method and system described herein.

Nodes

Each node in this network is directly connected to one or more othernodes. A node could be a computer, network adapter, switch, or anydevice that contains memory and an ability to process data. Each nodehas no knowledge of other nodes except those nodes to which it isdirectly connected. A connection between two nodes could be severaldifferent connections that are ‘bonded’ together. The connection couldbe physical (wires, etc), actual physical items (such as boxes, widgets,liquids, etc), computer buses, radio, microwave, light, quantuminteractions, etc.

No limitation on the form of connection is implied by the inventors.

In FIG. 29 Node A is directly connected to nodes B and C. Node C is onlyconnected to Node A. Node B is directly connected to four nodes.

‘Chosen Destinations’ are a subset of all directly connected nodes. Only‘Chosen Destinations’ will ever be considered as possible routes formessages (discussed later).

‘Chosen Destinations’ is equivalent to ‘Best Neighbour’'s. It is used inthis section of the document since ‘Best Neighbour’ may be somewhatmisleading since there can only really be one ‘best neighbour’, whereasthere can be multiple ‘chosen destinations’.

Queues and Messages

Communication by end user software (EUS) is performed using queues.Queues are used as the destination for EUS messages/payload, as well asmessages that are used to establish and maintain reliable communication.Every node that is aware of the existence of a queue has a correspondingqueue with an identical name. This corresponding queue is a copy of theoriginal queue, however the contents of queues on different machineswill be different.

Messages are transferred between nodes using queues of the same name. Amessage will continue to be transferred until it reaches the originalqueue. The original queue is the queue that was actually created by theEUS, or the system, to be the message recipient.

A node that did not create the original queue does not know which nodecreated the original queue.

Each queue created in the system is given a unique label that includesan EUS or system assigned queue number and a globally unique identifier(GUID). The GUID is important, because it guarantees that there is onlyevery one originally created queue with the same name. For example:

-   -   Format: EUSQueueNumber.GUID    -   Example: 123456.af9491de5271abde526371

Alternative implementations could have several numbers used to identifythe particular queue. For example:

-   -   Format: EUSAppID.EUSQueueNumber.GUID    -   Example: 889192.123456.af9491de5271abde526371

Each node can support multiple queues. There is no requirement thatspecific queues need to be associated with specific nodes. A node is notrequired to remember all queues it has been told about.

If a node knows about a queue it will tell those nodes it is connectedto about that queue. (discussed in detail later). The only node thatknows the final destination for messages in a queue is that finaldestination node that created that queue originally. A node assumes anynode it passes a message to is the final destination for that message.

At no point does any node attempt to build a global network map, or haveany knowledge of the network as a whole except of the nodes it isdirectly connected to. The only knowledge is has is that a queue exists,how long a message will take to reach the node that originally createdthat queue, and the maximum time a latency update from the original nodewill take to reach this node.

Latencies

Latencies play a central role in choosing the best path for data in thenetwork. When node B tells node A that its latency is X seconds, it issaying that if node A were to pass a message to node B, that messagewould take X seconds to arrive at the ultimate destination and bede-queued by the EUS.

This latency value as calculated by node B is:

Latency=MinOverTimePeriod([Bytes In Queue])*[Bytes/Second SendRate]+[Lowest Latency of All Chosen Message Destinations]+[Service timeon this queue]+[Physical Network Latency]

Min Over Time Period is a period of time determined by the time it takesto perform a minimum of five sends or receives from the send and receivenodes associated with this queue. It is also a minimum time of 30 ms (ora reasonable multiple of the granularity of the fast system timer) Thiswill be discussed in more detail later.

Bytes/Second Send Rate is the best estimate of the rate of data flowingout of the queue on this node.

Lowest Latency of All Chosen Message Destinations All directly connectednodes with knowledge of this queue will provide a latency to the nodethat originally created the queue. This is the lowest latency of allthose nodes that are chosen destinations for this queue.

Service Time On This Queue is the time is takes for the node to attemptto send data from all other queues before it comes back to this one,excluding this particular queue.

FIG. 30 illustrates how service time could be calculated

Calculation of Service Time on a Queue

For each directly connected node there is a list of queues that havethat node as their chosen destination, and have data in their queue tosend.

In order to service queues fairly, the system will cycle through thesequeues sending messages from them to a ‘chosen destinations’ in a roundrobin fashion. Each ‘chosen destination’ will have its own list ofqueues that it will cycle though on its own.

If quality of service (QOS) were to be implemented, this order ofprocessing could be shifted to process the ‘more important’ queues moreoften.

Some types of nodes will have system timers with different resolutionsavailable. Many times the low resolution timer is much faster to readthe time from, thus it makes sense to use the lower resolution timer toincrease the performance of a node.

The tradeoff is a slightly more complex algorithm for determining theservice time on a queue.

As the system cycles through the list of queues for a chosendestination, it will record the number of times it was able to send amessage from a particular queue by incrementing a counter associatedwith that queue. It will only increment this counter if it is able topass a message from this queue to a network adapter associated with thatchosen destination. It will only pass a message from that queue if thereare the appropriate queue tokens available, and it passes the latencytest. Both of these concepts will be defined later.

The node will also record how many messages it was able to send from allthe queues.

It will keep looping through this round-robin process for at least 3 or4 ticks of the low resolution timer. In the case of Windows 2000, totake one case but not to reduce the generality of this application, thiswould be approximately 45 or 60 milliseconds. For increased precision ofthis calculation, the number of ticks should be increased.

Once this time period has elapsed for a directly connected node, it willrecord these statistics:

1. The total number of messages sent to this directly connected node

2. The total time in seconds that this process took.

Each queue will also have recorded the number of messages that were sentfrom that queue to that particular chosen destination during that timeperiod.

Two iterations of these statistics are stored; the one currently inprogress and the last complete one. This allows the node to calculatethe service time for the queue while continuing to gather new data forthe next service time value.

To calculate the service time for this queue with this particular chosendestination (CD) this equation is used:

ServiceTime=([TotalMessagesSentToCD]−[TotalMessagesSentFromQToCD])/[TotalMessagesSentToCD]*[TotalTimeInSecondsForIterations]

If there are multiple chosen destinations, we'll use this followingequation to derive the service time:

Service Time=1/(1/[CD1Time]+1/[CD2Time]+ . . . ))

This value is only calculated when it is being sent as part of a latencycalculation. This reduces computational overhead.

Physical Network Latency

This is defined as

-   -   The time needed to send a packet similar to the average message        size to a directly connected node, and have that packet be        received by that directly connected node.

This value is very similar to the value of ‘ping’ in a traditionalTCP/IP network.

This physical network latency is added to the latency provided todirectly connected nodes, every time a calculation is performed usingthe latency that is provided by a directly connected node. For example,physical network latency would be used when:

1. Determining which is the lowest latency chosen destination

2. Detecting loops passively when not in the data stream. (definedlater)

3. Picking additional chosen destinations

This value can be initialized by sending and timing a series ofpredefined packets to the directly connected node.

During operation of the system this value is re-calculated based onactual performance.

Assuming the network card is continuously sending data, all the systemneeds to do is record the amount of data sent, the average size ofmessage and how much time elapses. The equation would look like:

Physical NetworkLatency=[AverageMsgSize]/([TotalBytesSentDuringPeriod]/[ElapsedTime])

-   -   The time period should be chosen to be similar to the time        period used to calculate service time.

End User Software

Unlike conventional networks where each machine has an IP address andports that can be connected to, this system works on the concept ofqueues.

When the end user software (EUS) creates a queue, it is similar toopening a port on a particular machine. However, there are severaldifferences:

-   -   1. When connecting to a queue all you need is the name of the        queue (For example: QueueName.GUID as discussed previously),        unlike TCP/IP where the IP address of the machine and a port        number is needed. The name of the queue does not necessarily        bear any relationship to the node, the node's identity or its        location either physically or in the network.    -   2. In TCP/IP when a node is connected to the network it does not        announce its presence. Under this new system when a node is        connected to the network it only tells its directly connected        neighbors that it exists. This information is never passed on.    -   3. When a port is opened to receive data under TCP/IP this is        not broadcast to the network. With the new system when a queue        is created the entire network is informed of the existence of        this queue, in distinct contrast to the treatment of nodes        themselves, as described in ‘2’ immediately above. The queue        information is propagated with neighbor to neighbor        communication only.

These characteristics allow EUS' to have connections to other EUS'without any information as to the network location of their respectivenodes.

In order to set up a connection between EUS' a handshake protocolsimilar to TCP/IP is used.

-   -   1. Node A: Creates QueueA1 and sends a message to QueueB with a        request to open communication. It asks for a reply to be sent to        QueueA1. The request would have a structure that looks like        this:

struct sConnectionRequest { // queue A1 (could be replaced with anumber - // discussed later) sQNameType qnReplyQueueName; // updateassociated with queue A1 (explained // later) Includes Latency,UpdateLatency, etc.. sQUpdate quQueueUpdate; }

As this message travels through the network it will also bring along thedefinition for queue A1. This way, when this message arrives there isalready a set of nodes that can move messages from the Node B to queueA1.

If Node A has not seen a reply from node B in queue A1, and queue A1 onnode A is not marked ‘in the data stream’ (indicating that there is anactual connection between node B and queue A1), and it still hasnon-infinity knowledge of queue B (indicating that queue B, and thusnode B still exists aid is functioning), it will resend this message.

It will resend the message every 1 second, or every ‘Queue B Latency’seconds—which ever is longer.

Node B will of course ignore multiple identical requests.

If any node has two identical requests on it, that node will delete allexcept one of these requests.

-   -   2. Node B: Sends a message to Queue A1 saying: I've created a        special Queue B1 for you to send messages to. I've allocated a        buffer of X bytes to re-order out-of-order messages.

struct sConnectionReply { // queueB1 sQNameType qnDestQueueForMessages;// update associated with queue B1 (explained // later) IncludesLatency, UpdateLatency, etc.. sQUpdate quQueueUpdate; // buffer used tore-order incoming messages integer uiMaximumOutstandingMessageBytes; }

As this message travels through the network it will also bring along thedefinition for B1. As a result of this mechanism, when this messagearrives there will be already a set of nodes that can move messages fromthe Node A to queue B1.

If Node B does not see a reply from node A in queue B, and queue B1 onnode B is not ‘in the data stream’, and node B still has non-infinityknowledge of queue A1, it will resend this message.

It will resend the message every 1 second, or every ‘Queue A1 Latency’seconds—which ever is longer.

Node B will continue resending this message until it receives asConfirmConnection message, and queue B1 is marked ‘in the data stream’.

Node B will of course ignore multiple identical sConfirmConnectionreplies.

If any node has two or more identical replies on it, that node willdelete all except one.

-   -   3. Node A: whenever node receives a sConnectionReply from node B        on queue A1, and it has knowledge of queue B1, it will send a        reply to queue B indicating a connection is successfully set up.

struct sConfirmConnection { // the queue being confirmed sQNameTypeqnDestQueueForMessages; }

If a any node has two identical sConfirmConnection messages on it, thatnode will delete all except one of these messages.

By attaching the queue definitions to the handshake messages the timeoverhead needed to establish a connection is minimized. It is minimizedbecause the nodes do not need to wait for the queue definition topropagate through the network before being able to send.

Node A can then start sending messages. It must not have more then thegiven buffer size of bytes in flight at a time. Node B sendsacknowledgements of received messages from node A. Node B sends theseacknowledgements as messages to queue A1.

An example of the arrangement of nodes and queues looks like FIG. 31.

Acknowledgements of sent messages can be represented as a range ofmessages. Acknowledgments will be coalesced together. For example theacknowledgement of message groups 10-35 and 36-50 will becomeacknowledgement of message group 10-50. This allows multipleacknowledgements to be represented in a single message.

The structure of an acknowledgement message looks like:

struct sAckMsg { integer uiFirstAckedMessageID; integeruiLastAckedMessageID; }

Acknowledgements (ACKs) are dealt with in a similar way to TCP/IP. If asent message has not been acknowledged within a multiple of average theACK time of the messages sent to the same ‘chosen destination’, then themessage will be resent.

The message is stored on the node where the EUS created them, until theyhave been acknowledged. This allows the messages to be resent if theywere lost in transit.

If the network informs node B that queue A1 is no long visible it willremove queue B1 from the network and de-allocate all buffers associatedwith the communication. If the network informs node A that queue B1 isno longer visible then node A will remove queue A1.

This will only occur if all possible paths between node A and node Bhave been removed, or one or both of the nodes decides to terminatecommunication.

If messages are not acknowledged in time by node B (via anacknowledgement message in queue A1) then node A will resend thosemessages.

Node B can increase or decrease the ‘re-order’ buffer size at any timeand will inform node A of the new size with a message to queue A1. Itwould change the size depending on the amount of data that could beallocated to an individual queue. The amount of data that could beallocated to a particular queue is dependent on:

1. How much memory the node has

2. How many queues it remembers

3. How many data flows are going through it

4. How many queues originate on this node

This resize message looks like this:

struct sResizeReOrderBuffer { // since messages can arrive out of order,// the version number will help the sending // node determine the mostrecent // ‘ResizeReorderBuffer’. integer uiVersion; // the size of thebuffer integer uiNewReOrderSize; }

There is also a buffer on the send side (node A). The size of thatbuffer is controlled by the system software running on that node. Itwill always be equal or less then the maximum window size provided bynode B.

Nodes in the Data Stream

A node is considered in the data stream if it is on the path for dataflowing between an ultimate sender and ultimate receiver. A node knowsit is in the data stream because a directly connected node tells it thatit is in the data stream.

Data may flow through a node that is not marked as in the data stream.Only a node marked as ‘in the data stream’, is considered to be in thedata stream. A node with data flowing through it but is not marked inthe data stream is considered not to be in the data stream.

The first node to tell another node that it is ‘in the data stream’ isthe node where the EUS resides that is sending a message to thatparticular queue. For example, if node B wants to send a message toqueue A1. Node B would be the first node to tell another node that it is‘in the data steam’ for queue A1. A node will send a queue's like queueB without marking them ‘in the data stream’.

A node in a data stream for a particular queue will tell all its nodesthat are ‘chosen destinations’ for that queue, that they are in the datastream for that queue. If all the nodes that told the node that it wasin the data stream tell it that it is no longer in the data stream thenthat node will tell all its ‘chosen destinations’ that they are nolonger in the data stream.

Basically, if a node is not in the data stream any more it tells allthose nodes it has as chosen destinations that they are not in the datastream.

This serves two purposes. First it allows the nodes in the data streamto instantly try to find better routes, Second it ensures that nodes inthe data stream do not ‘forget’ about the queues.

The structure used to tell another node that is in the data stream is:

struct sDataStream { // the name of the queue, this could be replacewith a // number that maps to the queue name. (discussed later) sQNameqnName; // true if now in the stream, false if not. bool bInDataStream;};

Only data streams for queues of type B1 have the ability to createdbraided multi-path routes. Queues of type A1 that are in the datastream, can be limited to a single path if desired because ACK messagesare both small and are easily merged together. Nodes of type B are nevermarked as ‘in the data stream’.

A possible enhancement would be GUID probing each node about to be addedto the ‘data stream’ to be sure it is non-looping. (GUID probes definedlater).

Node Tasks

Nodes communicate with directly connected nodes to send messages createdby an EUS to another EUS. Nodes will also send messages used toestablish and maintain a reliable communication with another EUS.

To send messages a node must determine

1. Where to send messages

2. When to send messages

Each of these occasions is addressed in the following sections.

Where to Send Messages

To determine where it will send messages a node tries to pick aconnected node:

1. That provides the best latency to the ultimate destination

2. That will not introduce a ‘loop’

3. Not at its sending capacity

Initial Queue Knowledge

When a queue is created by an EUS the system needs a way to tell everynode in the network that the queue exists, and every node needs a paththrough other nodes to that queue. The goal is to create both theawareness of the queue and a path without loops.

When the EUS first creates the queue, the node that the queue is createdon tells all directly connected nodes:

1. The name of the queue

-   -   This is a name that is unique to this queue. Two queues        independently created should never have the same name.

2. Latency

-   -   Discussed previously. This is a value in seconds that describes        how long it will take a message to travel from that node to the        node that is the ultimate receiver.

3. ‘At capacity’ status

-   -   Discussed Later. This is a boolean value that is true if the any        of the nodes in the path of chosen destinations for this node        are unable to handle more data flow then they are already have.

4. Update latency

-   -   Discussed Later. This is a value in seconds that describes the        maximum time a latency update from the ultimate receiver will        take to reach this node.

5. Distance from data stream

-   -   Discussed Later. Very similar to ‘Update Latency’, except that        it describes how far this node is from a node marked in the data        stream. This can be used to decide which queues are ‘more        important’. An alternative implementation could have it        represent how far a node is from either a marked data stream, or        a node carrying payload messages.

This update takes the structure of:

struct sQUpdate { // the name of the queue. Can be replaced with // anumber (discussed later) sQName qnName; // the time it would take onemessage to travel // from this node to ultimate receiver and be //consumed by the EUS float fLatency; // if true, this node is alreadyhandling as // much data as it can send. (discussed later) boolbAtCapacity; // the maximum time a latency update will // take to travelfrom the ultimate receiver // to this node. (discussed later) floatfUpdateLatency; // calculated in a similar fashion // to‘fUpdateLatency’. and records the distance // from a marked data streamfor this node. float fLatencyFromStream’; };

Regardless of whether this is a previously unknown queue, or an updateto an already known queue the same information can be sent.

Delayed sending of node updates and the ordering of node updates shouldfollow the same approach described previously.

If this is the first time a directly connected node has heard about thatqueue it will choose the node that first told it as its ‘chosendestination’ for messages to that queue. A node will only send EUSmessages/payload to a node or nodes that are ‘chosen destinations’, evenif other nodes tell it that they too provide a route to the EUS createdqueue.

If a node picks a directly connected node as a ‘chosen destination’, itmust tell that node that it was selected as a ‘chosen destination’. Thestructure of the message looks like this:

struct sPickedAsChosenDestination { // the name of the queue. Could bereplaced with a number // (discussed later) sQName qnName; // true ifthe node this message is being sent to is a // a chosen destination forthis queue. bool bSelected; };

A node will never pick another node as a ‘chosen destination’ if thatnode already has this node as a ‘chosen destination’ for that queue. Ifthis happens because both nodes pick each other at the same time itneeds be resolved instantly.

One approach would be for both nodes to remove each other as chosendestinations, wait a random amount of time and then try to re-selecteach other.

In this fashion a network is created in which every node is aware of theEUS created queue and has a non-looping route to the EUS queue through aseries of directly connected nodes.

FIG. 32 is a series of steps showing knowledge of a queue propagatingthe network. The linkages between nodes and the number of nodes in thisdiagram are exemplar only, whereas in fact there could be indefinitevariations of linkages within any network topography, both from anynode, between any number of nodes.

At no point does any node in the network attempt to gather globalknowledge of network topology or routes. The system provides every nodewith the names of the EUS created queues and the latencies the directlyconnected nodes provide to the EUS created queues.

Even if a node has multiple possible paths for messages it will onlysend messages along the node or nodes that it has chosen as its ‘chosendestinations’.

When a node has selected another directly connected node as its ‘chosendestination’, it will tell that node of its choice in order to avoidloops that may be created if two nodes pick each other as ‘chosendestinations’.

Every node keeps track of what queue's it has told its directlyconnected nodes about. Every new queue that the directly connected nodehas not been told about will be immediately sent (see PropagationPriorities). In the case of a brand new connection, nodes on either sideof that connection would send knowledge of every queue they were awareof.

If a node does not contain enough memory to store the names, latencies,etc of every queue in the network the node can ‘forget’ those queues itdeems as un-important. The node will choose to forget those queues wherethis node is furthest from a marked data stream. The node will use thevalue ‘fLatencyFromStream’ to decide how far this node is from themarked data stream.

An alternative embodiment could use the value fLatencyFromStream’ torepresent its distance from either a marked data stream, or a nodecarrying payload packets.

The only side effect of this would be an inability to connect to thosequeues, and for those nodes that rely exclusively on you for adestination to connect to those queues.

The value ‘fLatencyFromStream’ can be used to help determine whichqueues are more important (See Propagation Priorities). If the node is100 seconds from the marked data stream for queue A, and 1 second awayfrom a marked data stream for queue B, it should chose to remember queueB—because this node is closest to a marked data stream and can be moreuse in helping to find alternative paths.

A node that is told about a new queue name with latency of infinity(discussed later) will ignore that queue name.

Queue Name Optimization and Messages

Every queue update needs to have a way to identify which queue itreferences. Queue names can easily be long, and inefficient to send.Nodes can become more efficient by using numbers to represent longnames.

For example, if node A wants to tell node B about a queue named‘THISISALONGQUEUENAME.GUID’, it could first tell node B that:

1=‘THISISALONGQUEUENAME.GUID’

A structure for this could look like:

struct sCreateQNameMapping { int nNameSize; char cQueueName[Size]; intnMappedNumber; };

Then instead of sending the long queue name each time it wants to send aqueue update, it could send a number that represented that queue name.When node A decides it no longer wants to tell node B about the queuecalled ‘THISISALONGQUEUENAME.GUID’, it could tell A to forget about themapping.

That structure would look like:

struct sRemoveQNameMapping { int nNameSize; char cQueueName[Size]; intnMappedNumber; };

Each node would maintain its own internal mapping of what names mappedto which numbers. It would also keep a translation table so that itcould convert a name from a directly connected node to its own namingscheme. For example, a node A might use:

1=‘THISISALONGQUEUENAME.GUID’

And node B would use:

632=‘THISISALONGQUEUENAME.GUID’

Thus node B, would have a mapping that would allow it to convert nodeA's numbering scheme to a numbering scheme that makes sense for node B.In this example it would be:

Node A Node B 1 632 . . . . . . . . . . . .

Using this numbering scheme also allowueA1 andes to be messagetagged aeBwith ch queuet to opee destined for. For example, if the system had amessage of 100 bytes, it would reserve the first four bytes to store thequeue name the message belongs to, followed by the message. This wouldmake the total message size 104 bytes. An example of this structure alsoincludes the size of the message:

struct sMessage { int uiQueueID; int uiMsgSize; char cMsg[uiMsgSize]; }

When this message is received by the destination node, that node wouldrefer to its translation table to decide which queue this message shouldbe placed in.

Path to Queue Removed

If a node that is on the path to the node where the original queue wascreated, is disconnected from the node that it was using as its only‘chosen destination’, that node will first attempt to find a non-loopingalternative path.

It will do this by examining all nodes that are not currently sending tothis node (ie. Have this node as a ‘chosen destination’). If it picked anode that has this node as a ‘chosen destination’ a loop would becreated.

This node will use a GUID probe to check for loops in the remainingpossible nodes using the ‘GUID probe’ process described later in ‘Addingadditional routes’.

If all potential alternative paths are loops, the node will set itslatency to infinity, and tell all connected nodes immediately of thisnew latency.

If a node has a ‘chosen destination’ tell it a latency of infinity, itwill instantly stop sending data to that node and will remove that nodeas a ‘chosen destination’. If all ‘chosen destinations’ have beenremoved the node will set its own latency to infinity and immediatelytell its directly connected nodes.

Once a node has set its latency for a queue to infinity and tells itsdirectly connected nodes, it waits for a certain time period (one secondfor example). At the end of this time period the node will instantlychoose as a chosen destination any directly connected node that does nothave a latency of infinity, and resume the sending of data.

If it does not see a suitable new source within double the originalfixed time period (2 seconds for example) after the first time periodhas elapsed, it will delete messages from that queue, and removeknowledge of that queue.

This time period is based on a multiple of how long it would take thisnode to send the update that this queue has gone to infinity. (SeePropagation priorities later). This value is then multiplied by 10, or asuitably large number that is dependant on the interconnectedness of thenetwork.

For example, if the network is very large and sparsely connected, thenumber would be higher then 10. In a dense, well connected network, thevalue would be 10.

If a node's latency moves from infinity to non-infinity it willimmediately tell all directly connected nodes of its new latency.

In this example, in a network with ten nodes, an EUS has created a queueon one of the nodes that has a direct connection to two nodes, one oneach side of the network.

In FIG. 33, every node in the network has just become aware of the EUScreated queue (which has zero latency—lower right), the numbers in eachnode represent the latency in seconds as defined above.

Next, in FIG. 34, one of the connections between the node with the EUScreated queue is removed

The directly connected node that lost its connection to the node withthe EUS created queue will check to see if any of the nodes it isdirectly connected to are using it as sender. Since all of them are, itdoes not need to probe them with GUIDs to determine if they loop. Thisnode then sets itself to a latency of infinity. This is shown in FIG.35.

It immediately tells all directly connected nodes of its new latency. Ifall the node's ‘chosen destinations’ are at infinity, those nodes'latencies become infinity as well. This is shown in FIG. 36.

This process continues until all nodes that can be set to infinity areset to infinity. This is shown in FIG. 37.

At this point, every node that has been set to infinity pauses for afixed amount of time (for example, one second), and then picks thelowest latency destination it sees that is not infinity. This is shownin FIG. 38.

As soon as a node that was at infinity becomes non-infinity it will tellthe nodes directly connected to it immediately. If one of those nodes isat infinity it will select the first connected node to provide it with anon-infinity latency as its chosen destination. This is shown in FIG.39.

At this point the network connections have been re-oriented to enabletransfer of all the messages destined for the EUS created queue to thatqueue.

If a node's latency for a queue is at infinity for more then severalseconds the node can assume that there is no other alternative route tothe ultimate receiver and any messages in the queue can be deleted alongwith knowledge of the queue.

FIG. 40 outlines the above processes.

Converging on Optimal Paths for Nodes ‘Not in the Data Stream’

Nodes are always trying to lower their latency to the originally createdqueue by selecting different chosen destinations.

Only the queue that is established on nodes between the ultimate senderand ultimate receiver for transferring EUS message data will use braidedmultiple paths for increased bandwidth. The ultimate sender marks thispath by telling all ‘chosen destinations’ that they are in this datasending path (‘in the data stream’). Each of those ‘chosen destination’nodes tell their own ‘chosen destination’ nodes that they to are ‘in thedata stream’. FIG. 42 illustrates this.

If all senders to a particular node are disconnected or tell that nodethat it is no longer in the ‘data stream’, or the node that told it thatit was in the ‘data stream’ tells it that it is no longer a ‘chosendestination’, then it will clear the ‘in the data stream’ flag and tellall its chosen destinations they are no longer in the data stream.

If a node is currently in the path of EUS message transfers between twonode with EUS it uses a different mechanism to select a new ‘chosendestination’.

If a node that has multiple chosen destinations is removed from the datastream it will remove all chosen destinations except that one with thelowest latency. This enables the mechanism for finding loops to remaineffective since that mechanism will only work with one chosendestination.

A node that is not currently in the data stream will always try toimprove its latency to the ultimate receiver by selecting a node with alower latency then its current chosen destination.

A node needs to be sure that when it is selecting a different ‘chosendestination’ that it will not introduce a loop.

A node looking to upgrade its connection will prefer any node that isnot ‘at capacity’ (explained later) over any node that is ‘at capacity’,regardless of latency.

A the node is not currently in the path of EUS messages/payload it isnot allowed to use GUIDs, or messages to check to see if the possiblenew chosen destination is a loop because a network of any size wouldquickly be overrun with these messages.

Instead it watches the latency of a potential choice by waiting for aperiodic, automatic latency update from that node and compares it withthe latency of its currently ‘chosen destination’.

In the circumstance where a potential new destination would create aloop, if chosen, the major cause of apparent lower latency is lagintroduced by the travel time for data in the loop between the currentnode and this potential new node.

For example, if every second the current node's latency increases by 1s, and there was a loop with a three second lag between this node andthe new potential ‘chosen destination’, the new potential ‘chosendestination’ would always appear to have 3 second lower latency then thecurrent chosen destination.

FIG. 43 is another example of a potential loop to be avoided

If the current node chose this apparently ‘better choice’ it wouldcreate a loop in the system.

This is where the ‘fUpdateLatency’ value from the queue update is used.This number is the maximum time it takes for a latency update to travelfrom the node that created the queue. The actual calculation of thisvalue is discussed later.

In the previous diagram node B is trying to decide if node F is a betterchoice then node A. It will compare the difference in ‘fUpdateLatency’from node F and node A. The two values in this example would be:

Node A fUpdateLatency: 8 s

Node F fUpdateLatency: 13 s

Since node A is the currently chosen destination, and node F's‘fUpdateLatency’ is higher then node A's ‘fUpdateLatency’ node B needsto check to see if node F is actually routing its messages through asseries of nodes to node B.

Node B can't immediately discard node F as a valid new ‘chosendestination’ just because it has a higher ‘fUpdateLatency’. This isbecause the alternative route that node F provides, although potentiallya longer path to the ultimate destination it could be faster because ofcongestion on the route provided by node A.

The basic idea behind passive loop testing is the following.

-   -   The fUpdateLatency difference between A and F (in this example 5        seconds) is how long it will take at maximum for a latency        update sent from node B to reach node F.    -   If a loop is present, then the maximum latency value from node F        during this period of time will be greater then the median        latency value from node A during the same time period before        this time.

The total time period for the median must never be longer then value ofnode A's ‘fUpdateLatency’. For example, if the difference between the‘fUpdateLatency’ values of node A and node F was 500 seconds, and nodeA's ‘fUpdateLatency’ was 8 seconds, the time period for calculating themedian would be only 8 seconds. The time period watching for a maximumwould be 500 seconds.

FIG. 44 illustrates this.

This technique may yield a false positive for a loop, however it willonly very rarely yield a false negative. Dealing with a loop isdiscussed later.

Using a median in the above case would be ideal, however calculating amedian requires storing all the observations. Below is a pseudo codealgorithm that can approximate a median and requires a low fixedoverhead.

float fPart1 = 0; float fPart2 = 0; int nCount = 0; while (not done allobservations) { float fCurOb = GET_CURRENT_OBSERVATION( ) fPart1 =fPart1 + fCurOb; nCount = nCount + 1; fPart2 = fPart2 + abs( fPart1 /nCount − fCurOb); } float fCloseToMedian = fPart1 / nCount1 −fPart2/nCount;

If the observations' time periods are too small they will get rounded upone iteration of the low resolution timer.

During the observation period for both the median and the maximum, thevalues of fUpdateLatency may change. If the difference between the two‘fUpdateLatency’'s increases, the new increased time period will beused. Lower values will be ignored. This can lead to the circumstancewhere the ‘median’ time period will be smaller then the ‘maximum’ timeperiod. This is fine.

If the ‘fUpdateLatency’ for node F is less then node A, or becomes lessduring the course of the comparison, then no loop is possible and thenode can select node F as a new chosen destination without furtherdelay.

If the queue on this node is ‘at capacity’, we'll prefer to pick a nodewith a higher latency that is not ‘at capacity’. This node will stillwait the appropriate time to be sure that this ‘not at capacity’ nodestays ‘not at capacity’. If the considered node turns to ‘at capacity’during the time period, but it provides a lower latency and not a loopthen this node can use that node as a ‘chosen destination’.

If this node is currently at infinity, this process will not be used.(See previous)

If during the ‘maximum’ time period a latency update arrives from node Fgreater then the median of node A, then the test will end indicating aloop. Node B will not wait for the entire ‘maximum’ time period toexpire. The exception to this is if this node is ‘at capacity’ and thenode being considered is not ‘at capacity’.

Since a node not in the data stream can have only one chosen destinationfor that queue, when it picks a new chosen destination it will stopusing the old chosen destination.

When a node not in the data stream switches to new chosen destination itwill record the difference between current chosen destination's‘fUpdateLatency’ and the new chosen destination's ‘fUpdateLatency’. Thisvalue will be stored and used to help detect a loop. (discussed later)

At Capacity Checking

Each queue of each node also has a mechanism to detect when it issending or receiving data at capacity. A queue on a node is consideredat capacity when the latency of data in its queue exceeds

max([all chosen destination latencies])−min([all chosen destinationlatencies])

for more then 5 time intervals, for example. A time interval is definedas the time every destination able to send has sent a certain number ofmessages (for instance, 10), or a minimum of a certain time period (forexample, double the minimum granulation of the fast system timer), or amaximum of another time period (for example 6 seconds).

It is important that enough time has elapsed during the time intervalthat the chosen destinations have had the chance to bring the totalamount of data in the queue to the lowest point possible. For example,if data is flowing in at 100 bytes every second, and flowing out at 500bytes every five seconds, an absolute minimum time interval of 5 secondswould be required.

FIG. 45 is an example of queue levels and minimums during a timeinterval.

A node is considered at capacity if it is unable to bring the queuelatency down to this level over this time period. If it is unable to doso, then there is too much data flowing into the node to successfullysend it out almost as soon as it arrives.

When a node is at capacity it tells all nodes that are connected to it.If all ‘chosen destinations’ for a queue on a node are marked ‘atcapacity’ then that node tells all its directly connected nodes it thatit is also at capacity.

‘At capacity’ updates travel through the network at the same time asnormal latency updates. They do not preempt normal data flow. (seesQueueUpdate previously)

As discussed previously, nodes that are not in the flow of data willattempt to find non-looping alternatives to ‘chosen destinations’ thatbecome marked ‘at capacity’. If a node is in the data stream, it willnot attempt to remove an ‘at capacity’ node as a ‘chosen destination’because of its ‘at capacity’ status, it will make its decision to removethat node based on latency only.

Finding Additional Routes when at Capacity

A node ‘at capacity’ because it has too much data flowing into it willmake a list of all possible additional routes using directly connectednodes. A possible additional route is a node that:

-   -   1. Is not at capacity    -   2. Is not sending to this node    -   3. Will not create a loop    -   4. Has a destination of a queue other than the node querying        (ie. Not loop creating)

For each of these possible routes the at-capacity node will create aunique GUID. This GUID will be sent down each possible route to testeach of the routes for a loop. If a loop is detected that route isdiscarded from the list of possible additional routes.

Each GUID that corresponds to a possible route is sent to thedestination node next along that route. That node will store and forwardthat GUID on to all nodes it has as ‘chosen destinations’. If the nodechooses a new node for a destination then the GUID will be passed tothat new node. A node will deactivate a GUID by telling all ‘chosendestinations’ to forget the GUID. If all the nodes telling it toremember the GUID, tell it to stop remembering the GUID, or tell themthat they are no longer chosen as a destination, or they aredisconnected, the GUID is deactivated.

FIG. 46 is an example of this. In FIG. 46 if the node at capacity sees aGUID it sent to a possible additional chosen destination it knows thatchoice would be a bad choice.

In this same manner the GUID sent to a chosen route will enumerate allpaths along which data could flow from that node.

If the machine that is at capacity sees one of the GUID's it has sentout coming back to it from a node that is sending it data then it knowsthat the node down which it sent the GUID forms part of a loop, and thatpossible route is eliminated as a choice to relieve the ‘at capacity’status. A GUID message is composed of a GUID, the name of the queue inquestion, a ‘travel time’, and a note telling the node to either‘remember’ or ‘forget’ this GUID.

When a node is told of a GUID to remember or forget it will send thismessage as soon as possible (see Propagation Priorities). If it hasalready seen and processed this GUID message it will ignore it.

A GUID message will take the structure of:

struct sGUIDProbe { // could also be a number that represents this queue// (discussed previously) sQueueName  qnQueueName; // true if the nodeis supposed to remember this GUID // false if its supposed to forget it.bool bRememberGUID; // the actual GUID char cGUID[constant_Guid_Size];// how far the GUID will travel (based on fUpdateLatency) floatfMaximumGUIDTravelTime; };

The travel time for the GUID is set as triple (for example) thedifference between the fLatencyUpdate of node looking for a new routeand the fLatencyUpdate of the possible new route that is not atcapacity. Each time a node receives a GUID probe it subtracts itscontribution to the fLatencyUpdate value from the fMaximumGUIDTravelTimetime before it tells its directly connected nodes of this GUID probe(instead of adding this value to fLatencyUpdate they way it normallydoes). If after it subtracts its contribution fromfMaximumGUIDTravelTime the value is less then 0, the GUID probe is notpassed on to any chosen destinations.

The value that it subtracts is based on the time for a round robinupdate of all the queues in the same class as the queue this GUID probeis based on. (discussed later, see ‘Propagation priorities’—‘secondgroup’)

The node that is at capacity will wait for a minimum of its initial‘fMaximumGUIDTravelTime’ to give the GUIDs a chance to work through thenetwork and loop back to the ‘at capacity’ node, if a loop exists. Ifthe time has elapsed, all potential choices whose GUM did not make itback to the node are considered valid options.

The lowest latency, not ‘at capacity’, non-looping node is chosen and amessage is sent to that node indicating that it now a ‘chosendestination’. This is done to prevent two directly connected nodes fromchoosing themselves as destinations, creating a loop.

If two directly connected nodes select each other as destinations at thesame time, they will both instantly switch back to their previousdestinations and retry the process of finding additional destinations.Since the GUID mechanism includes a random interval the likelihood ofthe two nodes again selecting each declines dramatically at eachiteration.

If all possible routes came back as loops, the ‘at capacity’ node willremove the GUID's. If this node is still ‘at capacity’ after a period oftime it retry the process looking for alternatives. It will wait (forexample) three times the maximum ‘fMaximumGUIDTravelTime’ used for thelast round of GUID probes.

Even though a node has several choices where to send data, the maximumlatency allowed in the queue is still

max([all chosen destination latencies])−min([all chosen destinationlatencies])

subject to available memory on that node. As soon as this newdestination is chosen the node will be able to clear its ‘at capacity’status.

This maximum is not a hard limit, since it is possible there may beoutstanding flow control quota allowing for a bit more data to be sent.(see flow control)

Every time a token update is sent to a node sending data to this node,the current minimum latency over the last time interval as well as the‘at capacity’ flag is sent along as well. This enables sending nodes tohave the current latency data enabling them to always choose the bestroute.

Removing Unused Additional Routes

Because nodes not in the data stream only ever have one chosendestination, they don't remove additional sources, instead they switchfrom one source to a better source. (discussed previously).

Nodes in the data stream are the only nodes that are given the potentialto develop multiple data paths. (discussed previously).

If a node in the data stream does not use a particular ‘chosendestination’ to send data for a certain amount of time, then the nodewill remove that chosen destination from its list of chosen destinationsand alert that node that it is no longer a chosen destination.

Telling a node that is no longer a chosen destination will also removethe ‘in the data stream’ flag unless another node that is ‘in the datastream’ has also selected this node as a chosen destination.

The certain amount of time to wait before removing an un-used chosendestination should be relatively long compared to the amount of timerequired to create the connection in the first place. The amount of timea chosen destination is maintained could also be dynamically adjustedover time based on how much time elapsed between when a node is removeduntil when it is re-added in.

Deciding when to Add/Remove a Chosen Destination while not ‘At Capacity’

A node must always have at least one ‘chosen destination’ if anypossible choice exists. (if not its latency would be at infinity)

If a node is in the data stream for a particular queue it may have morethen one ‘chosen destination’ if the queue is the queue used to transferdata. In our TCP/IP handshake example, this would be queue B1 (diagrampreviously).

If a node is not at capacity, and is not able to remove a ‘chosendestination’ because all of the ‘chosen destinations’ are too active tobe removed (see previous) then it will try to add a new chosendestination with a latency that is less then highest latency of all the‘chosen destinations’.

It must only choose possible ‘chosen destinations’ that are not ‘atcapacity’.

The node does this in hope that it will replace its current ‘chosendestinations’ with better choices. This will allow the node to make theentire route faster, as well as need less buffer space for messagespassing through it.

The node will probe the possible choice with a GUID probe (describedabove). If the GUID probe fails (a loop was detected) then next time thenode attempts to optimize this connection it will pick another directlyconnected node with the next lowest latency.

FIG. 47 is a flowchart that illustrates this process.

Resolving Accidentally Created Loops

If a loop is accidentally created in nodes that are not part of themarked data stream their latency and ‘fUpdateLatency’ will spiralupwards.

FIG. 48 shows a loop that was accidentally created in nodes not in thedata stream.

Loops in nodes not in the data stream will be rare because of the way wecompare latencies for possible new chosen destinations. (see above).

Because we probe possible new destinations explicitly for loops usingGUIDs, loops will not be created in the data stream except very, veryrarely as a result of intervening path changes after the GUID mechanismhas been used.

Simple loops that do not involve nodes ‘at capacity’ or nodes that havegone to infinity will be easily resolved using the standard ‘passive’loop find mechanism.

Nodes in a loop will create the appearance of knowing about the queuewith no actual connection to the ultimate receiver for that queue. Forexample, if a loop is maintained, and the actual ultimate receiverleaves the network, this loop would continue to self-maintain this queueknowledge.

This problem occurs when:

-   -   1. Nodes inside the loop are not ‘at capacity’ and nodes outside        the loop are ‘at capacity’.    -   2. Nodes outside the loop are at ‘infinity’.

In both cases the solution is to detect that there is a possibility of aloop and change their latency to infinity in the same manner asdiscussed previously. This will cause the nodes to move into a non-loopstate quickly.

If we're on a node that is not in the data stream, and there aredirectly connected nodes that are:

1. ‘At Capacity’ when this node is not

2. Have a latency of infinity when we do not

Then loop testing will be invoked.

During the process of choosing a new ‘chosen destination’ the noderecorded the difference in the ‘fUpdateLatency’'s of the new ‘chosendestination’ and the old ‘chosen destination’. This time in secondsmultiplied by three will be referred to as the ‘possible loop time’(PLT).

Our loop testing will begin by recording the minimum 1 ‘fpdateLatency’,‘fLatencyFromStream’ and ‘fLatency’ for the PLT.

If during two successive iterations, all three recorded values(‘fUpdateLatency’, ‘fLatencyFromStream’ and ‘fLatency’) were less thenthe iteration before, then a GUID probe is used to determine if there isin fact a loop. The GUID probe (see previously) is set up to travel PLT*5 (for example) time through the network.

If a loop is detected then the node that detected it will go to infinityin the same manner as ‘Path to Queue Removed’.

If the GUID probe fails then the node returns to its loop testingdescribed above.

If this process repeats three times then the node will goto ‘infinity’anyway. (See ‘Path to Queue Removed’)

When to Send Messages

In determining when to send a message the node decides if the node beingsent to:

-   -   1. Has room to store the message    -   2. Provides latency to the destination that is useful given the        latencies of other directly connected nodes and the amount of        data in this node's queue.

Send to Useful Chosen Destinations Only

Even if a node has chosen multiple ‘chosen destinations’ for sendingmessages it does not mean that they will all be used. A ‘chosendestination’ will only be used if the current latency of data in thequeue is equal or greater then the

=[Chosen Destination Latency]−min([All Chosen destinations])

If a ‘chosen destination’ latency is x seconds over the minimum of allthe ‘chosen destinations’ latencies then x seconds of data would bestored on that node before using that chosen destination.

If a chosen destination has latency above the current queue latency (asdefined previously) then we have the option of sending a message to thatnode asking it to inform us when the latency of that node drops below aspecified value. Asking a node to send an update at a specified valuewill also cause the node to send the current latency.

This solves the problem of rapid updates required to keep the senderinformed as to the latency of the receiver.

Latency and ‘at capacity’ updates are passed both in token updates(defined later), as well as a constant stream that is throttled not toexceed X % of node to node bandwidth. Usually this number would be 1-5%.The node would cycle through all known available latencies in around-robin fashion. (See Propagation Priorities) Other ways todetermine what order or frequency to send queue updates could also beused:

-   -   1. Percentage change    -   2. A particular class of queue names are marked for more        frequent updating    -   3. A ‘distance from data stream’ counter could be used to        increase latency updates in the vicinity of the data stream.

If no queue messages are sent to a chosen destination for a certain timeperiod then that chosen destination is removed from the list of chosendestinations for that node. This time period would be at least an orderof magnitude greater then the total time needed to establish thedestination initially. An adaptive approach could also be used(described previously).

Flow Control

Each node has a variable amount of memory, primarily RAM, used tosupport information relevant to connections to other nodes and queues,e.g. message data, latencies, GUIDs, chosen destinations etc.

An example of the need for flow control is if node A has chosen node Bas a destination for messages. It is important that node A is notallowed to overrun node B with too much data.

Flow control operates using the mechanism of tokens. Node B will givenode A a certain number of tokens corresponding to the number of bytesthat node A can send to node B. Node A is not allowed to transfer morebytes then this number. When node B has more space available and itrealizes node A is getting low on tokens, node B can send node A moretokens.

There are two levels of flow control. The first is node-to-node flowcontrol and the second is queue-to-queue flow control. Node-to-node flowcontrol is used to constrain the total number of bytes of any data(queues and system messages) sent from node A to node B. Queue-to-queueflow control is used to constrain the number of bytes that move from aqueue in node A to a queue in node B with the same name.

For example, if 10 bytes of queue message move from node A to node B, itcosts ten tokens in the node-to-node flow control as well as 10 tokensin the queue-to-queue flow control for that particular queue.

When node B first gives node A tokens, it limits the total number ofoutstanding tokens to a small number as a start-up state from which toadjust to maximize throughput from node A.

Node B knows it has not given node A a high enough ‘outstanding tokens’limit when two conditions are met:

-   -   if node A has told node B that is had more messages to send but        could not because it ran out of tokens, and    -   Node B has encountered a ‘no data to send’ condition where a        destination would have accepted data if node B had had it to        send.

If node A has asked for a higher ‘outstanding tokens’ limit and node Bhas not reached ‘no data to send’ condition, node B will wait for a ‘nodata to send’ condition before increasing the ‘outstanding tokens’ limitfor node A.

Node B will always attempt to keep node A in tokens no matter the‘outstanding tokens limit’. Node B keeps track of how many tokens itthinks node A has by subtracting the sizes of messages it sees from thenumber of tokens it has given node A. If it sees node A is below 50% ofthe ‘outstanding limit’ that node B assigned node A, and node B is ableto accept more data, then node B will send more tokens up to node A.Node B can give node A tokens at its discretion up to the 50% point, butat that point it must act.

Assigning more tokens represents an informed estimate on Node B's partas to the maximum number of tokens node A has available to send datawith.

This number of tokens, when added to node B's informed estimate of thenumber of tokens node A has, will not exceed the ‘outstanding tokens’limit. It may also be less, depending on the amount of data in node B'squeue. (discussed later).

For example, lets consider node A and node B that are negotiating sothat node A can send to node B. FIG. 49 shows the current state.

Node B has created the default quota it wants to provide to node A. Itthen sends a message to node A with the quota (the difference betweenthe current and the maximum). It also includes a version number that isincremented each time the maximum limit is changed. The message node Bsends to node A looks like this:

struct sQuotaUpdate { // the version unsigned integer uiVersion; // thequeue name or number (see previous) sQNName qnName; // how muchadditional quota is sent over unsigned integer uiAdditionalQuota; };

We do this so that when node A tells us that it wants to send more data,it will only do so once for each time we adjust the maximum limit. FIG.50 shows the current state.

If node A wants to send a message of 5 bytes to node B it will not haveenough quota. Node A would then send a message to node B saying ‘I'dlike to send more’. It will then set its ‘Last Want More Ver’ to matchthe current version. This will prevent node A from asking over and overagain for more quota if node B has not satisfied the original request.This message looks like this:

struct sRequestMoreQuota { // the queue name or number (see previous)sQNName qnName; };

FIG. 51 shows this state.

Node B has no data in its queue and yet it would have been able to sendto its chosen destination, so it will increase the maximum quota limitfor node A to 100 bytes. It will send along the new quota along with thenew version number. FIG. 52 shows this state.

Node A now has enough quota to send its 5 byte message. When the messageis sent, node A removes 5 bytes from its available quota. When themessage is received by node B, it removes 5 bytes from the current quotait thinks node A has. FIG. 53 shows this state.

Messages can continue to flow until node A runs out of quota or messagesto send. If the quota that node B thinks node A has drops below 50bytes, node B will send a quota update immediately. A quota update thatdoes not change the maximum limit will not result in the version beingincremented. Quota updates for different queues can piggy back together,thus if one quota update ‘needs’ to be sent, others that just need a topoff can be sent at the same time. This will reduce the incidence of aspecial message being sent with just one quota update.

In general, system messages can also be piggy-backed with data messagesto reduce their impact.

The same approach to expanding the ‘outstanding limit’ forqueue-to-queue flow control also applies to node-to-node flow control.

The ‘outstanding limit’ is also constantly shrunk at a small but fixedrate by the system (for example, 1% every second). This allows automaticcorrection over time for ‘outstanding limits’ that may have grown largein a high capacity environment but are now in a low capacity environmentand the ‘outstanding limit’ is unnecessarily high. If this constantshrinking drops the ‘outstanding limit’ too low, then the previousmechanism (requesting more tokens and more being given if the receivingnode encounters a ‘no data to send’ condition) will detect it andincrease it again.

At Capacity Flow Control

When giving other nodes quota to send, it is important that they begiven enough quota to move the receiving node to an ‘at capacity’ stateand keep it there if possible.

If the latency in a queue on the node is over (max([latency all chosendestinations])−min([latency all chosen destinations])), then eachincoming flow of data must not get more than their maximum ‘outstandinglimit’ of quota amount over this maximum latency.

This is implemented by having an ‘over capacity token count’ variableattached to the flow control structures on the receiving side thatrecords the number of bytes received from that source while the queue isover capacity.

This number is subtracted from the ‘max outstanding limit’ when it comesto providing the sending node with more quota.

If the queue latency drops below its maximum latency the ‘over capacitytoken count’ variable is set to 0.

When data is removed from a queue that is above capacity, we take thenumber of bytes that have been removed and subtract that sum of bytes asevenly as we can from all ‘over capacity token count’ variables that aregreater then zero. It is important that the ‘over capacity token count’is always equal to or greater then zero.

For example, if 120 bytes are removed from the queue and there are fourconnections putting data into that queue and their ‘over capacity tokencounts’ are 0, 100, 20, 50, we would divide the number of bytes (120) bythe number of ‘over capacity token count’ variables greater then zero(three), this gives us 40. Since the lowest ‘over capacity token count’variable is less then 40 (20), we will subtract that number (20) fromall ‘over capacity token count’ variables. This leaves us with 0, 80, 0,50 and have 60 bytes still left. We repeat the process and subtract 30from each of the remaining two ‘over capacity token count’ variables,leaving us 0, 50, 0, 20.

Flow Control for EUS Queues

In TCP/IP window size selection is important. If the window size inTCP/IP is too small performance will suffer, if it is too large systemresources will be used up without increasing performance.

This invention allows rapid convergence to the best window size using a‘send-side only’ algorithm.

Nodes that are part of the marked data stream will only buffer enoughdata to ensure they can send at maximum speed. This means that even ifthere are gigabytes of data to send, only a relatively fixed, smallpercent will ever be in transit at a given time.

However, if there are gigabytes of data to send (instead of just onesmall message), many more paths will be used to transfer that data.However, no matter how many paths were used the total amount of data intransit would not exceed the buffer provided to the ultimate sender bythe ultimate receiver.

A key metric that a node uses to determine which nodes they will send tois latency. If there are a thousand seconds of data remaining to send,then all paths with a latency to the destination of under 1000 secondsshould be considered. If there is a very small amount of data and thelatency to send it is 10 ms, then very few paths (and only the fastest)will be used to transfer data.

This allows nodes to recruit as many or as few nodes as needed to insurethe fastest transfer of data. This technique allows us to implicitlyincrease bandwidth when needed by trading off latency that is notneeded.

The amount of data in transit is also limited by the size of buffer thesending node can allocate to that queue. The best size for the sendbuffer is such that its latency is:

SendBufferLatency => Max(AllChosenDestinationLatencys) −Min(AllChosenDestinationLatencys)

This means that if we can keep adding nodes to our chosen destinationlist, we'll be able to keep expanding our send buffer on the ultimatesender.

The node with the EUS sending the messages should allow this send bufferto grow to a point where the EUS can keep the queue ‘at capacity’ (inthe same way as flow control works). This ensures that all ‘chosendestinations’ can be used as much as possible.

At the ultimate receiver messages received are placed into the re-orderbuffer. As the node is able to place these messages in order, they areshifted into a queue that the EUS uses to de-queue messages forprocessing. The size of this de-queue buffer is set the same way as thequeue buffers between nodes (discussed in flow control).

If the queue the EUS uses to retrieve messages exceeds its maximum size,this node tells its directly connected nodes that it is ‘at capacity’,and does not give any more quota to the directly connected nodes.Ordered messages from the re-order buffer are still placed into thisqueue used by the EUS, however the flow of incoming messages to there-order buffer will be cut off because this node is no longer handingout quota to directly connected nodes for this queue.

If the queue the EUS uses gets completely empty, and directly connectednodes wanted to send more messages to the node with the EUS, then themaximum size of the queue that the EUS uses is expanded (in the same waythe flow control works).

The size of this queue is also subject to downward pressure that sameway the queues are during flow control.

The size of the re-order buffer has no relation to the number ofmessages (or the number of bytes) that the queue used by the EUS canhold.

If the receiving EUS were to completely stop processing messages, allthe nodes in the network would shift to ‘at capacity’ for the queue, andthe ultimate sender would very quickly be given no more quota with whichto push messages into the network.

Propagation Priorities

In a larger network, bandwidth throttling for control messages will needto be used.

We're going to use several types of throttling. Total ‘control’bandwidth will be limited to a percent of the maximum bandwidthavailable for all data.

Control messages will be broken into two groups. Both these groups willbe individually bandwidth throttled based on a percentage of maximumbandwidth. Each directly connected node will have its own version ofthese two groups.

For example, we may specify 5% of maximum bandwidth for each group, witha minimum size of 4K. In a simple 10 MB/s connection this would meanthat we'd send a 4K packet of information every:

 = 4096/(10  MB/s * 0.05) = 0.0819  s

So in this connection we'd be able to send a control packet every 0.0819s, or approximately 12 times every second for each group.

The percentages, and sizes of blocks to send are examples, and can bechanged by someone skilled in the art to better meet the requirements oftheir application.

First Bandwidth Throttled Group

The first bandwidth throttled group sends these messages. These messagesshould be concatenated together to fit into the size of block controlmessages fit into.

-   -   1. Name to number mappings for queues needed for the following        messages.    -   2. Standard flow control messages    -   3. GUID probes    -   4. Informing a node if its now a ‘Chosen Destination’    -   5. HSPP messages    -   6. Initial Queue Knowledge/To Infinity/From Infinity of HSPP        queues    -   7. Initial Queue Knowledge/To Infinity/From Infinity of non-HSPP        queues.

Second Bandwidth Throttled Group

The second group sends latency updates for queues. It divides the queuesinto three groups, and sends each of these groups in a round robinfashion interleaved with each other 1:1:1.

The first two groups are created by ordering all queues using the valueof ‘fLatencyFromStream’. If the queue has multiple chosen destinations,then the ‘chosen destination’ with the lowest latency is used to decidewhich ‘fLatencyFromStream’ value we're going to use.

The queues are ordered in ascending order in a similar manner describedpreviously in the single path embodiment. They are divided into twobased on how many updates can be sent in a half a second using thethrottled bandwidth. This ensures that the first group will be entirelyupdated frequently, and the rest will still be updated—but lessfrequently.

The third group is composed of queues where this node is in the datastream.

Each latency update includes a value ‘fUpdateLatency’. This value‘fUpdateLatency’ is calculated separately for queues in each of thethree groups. It is calculated as the amount of time that it takes tosend all items in the group once. This value is added to the‘fUpdateLatency’ of the chosen destination with the lowest ‘fLatency’.

This value is also used when determining how far a GUID probe willtravel.

The time to send each of the three groups should be constantly updatedbased on current send rates.

A queue can only be a member of one of these groups at a time. This isimportant, otherwise the ‘fUpdateLatency’ would be difficult tocalculate.

The ‘fLatencyFromStream’ is calculated the same way as ‘fUpdateLatency’,except all nodes in a data stream will not add the ‘fLatencyFromStream’value from another node when they pass their ‘fLatencyFromStream’ ontodirectly connected nodes.

For example, if node A is in the data stream, and its time to update thegroup which the particular queue is in takes 3 seconds, it will tell alldirectly connected nodes that it is 3 seconds from the data stream.Alternatively, it could tell all directly connected nodes that it is 0seconds from the data stream.

If a queue needs to move from a high frequency update to a low frequencyupdate, we'll change its reported ‘fUpdateLatency’ latency number tomatch the lower frequency group, but keep the item in the high frequencygroup for three updates cycles before actually moving it to the lowerfrequency group.

If a node becomes aware of a new queue, it will place that queue at theend of the list of queues to update in one of three groups it belongs toin the second group of throttled updates.

Possible Uses

These are examples where this invention could be used. These examplesare not intended to limit the use of the invention.

-   -   1. Used in communication networks it would enable network        topography to take unlimited structures    -   2. Used in cell phone networks it would remove the need for        current ‘cells’ structure that needs to hand off a moving cell        phone to the next communications tower.    -   3. Used in a grid computing environment to help eliminate        hotspots and deal with failed nodes.    -   4. Used by utilities with the software enabled in all electrical        appliances such appliances could be turned on or off from a        central command centre in order to achieve system load        management    -   5. Used in computing it would enable multiple interconnected        CPUs or computers to be linked in order to exchange messages for        applications such as grid computing, mass storage or        super-computing environments which applications are currently        constrained by the lack of flexible, dynamic message routing        capability.    -   6. Used in military applications it would enable every soldier        and every piece of equipment in a theatre of combat to be in        constant communication across a continually changing        topographical structure and enable the network to continue        regardless of elements being removed or destroyed or added.    -   7. Used to form discrete network groups either in isolation from        or as a subset of larger networks it would enable any group to        form its own network at any time.    -   8. Used in traffic management it would enable motor vehicles        equipped with this software and with communications ability to        coordinate their highway interaction for greater efficiency or        safety or highway traffic management facilitation.    -   9. Used in traffic management of traffic signals it would enable        all traffic lights to communicate with traffic management        computers and with each other for greater effectiveness in        managing traffic flows, and enable traffic signals to be added        or deleted from the system with no need for any software        administration to the system.    -   10. Used as a ‘master network’ it could become the communication        utility for a community or region, providing virtually unlimited        capacity and back-up resources because every participant in the        network could provide linkage to the whole network and the sum        of its resources.    -   11. Used to manage a computing centre the software would add or        subtract machines and applications and administer and monitor        the centre without human intervention and without any need to        curtail or cease operations while doing so.    -   12. Used within an electrical energy grid this software could be        used to integrate generating, transmission and consumption to        deal with both ordinary changes and untoward events by making        decisions based on predetermined criteria and acting        immediately,    -   13. Used to enable remote computing by dynamically linking users        and remote sites with no human intervention.    -   14. Used in air traffic control by managing and coordinating        aircraft, air traffic and ground resources    -   15. Used to coordinate and network varying communications        technologies such as wireless, land line, satellite, computer        and airborne systems    -   16. Used to create efficient routes for the physical delivery of        goods to various destinations, such routes able to be altered        dynamically for varying circumstances such as traffic pattern        changes, additions or deletions to the route destinations.    -   17. Used as a mathematical tool similar to biological computing        for solving multiple simultaneous computations to find a correct        solution, especially to complex problems that involve many        criteria.

The above-described embodiments of the invention are intended to beexamples of the present invention and alterations and modifications maybe effected thereto, by those of skill in the art, without departingfrom the scope of the invention which is defined solely by the claimsappended hereto.

1. A computer readable medium for storing a set of programming instructions for execution by, or on behalf of, a first node on a self-organizing network having a plurality of nodes and at least one link interconnecting each of said nodes, said instructions causing a computing apparatus to identify the route between a source node and a destination node, wherein said instructions further cause the computing apparatus to send route updates about said destination node on a relatively more frequent basis the closer that said first node is to the route between said source node and said destination node.
 2. The computer readable medium as claimed in claim 1 wherein said instructions further cause the computing apparatus to identify the proximity of said first node to the identified route between said source node and said destination node.
 3. The computer readable medium as claimed in claim 1 wherein said node on the identified route between a source node and destination node will set the importance value of said destination node to a predefined value.
 4. The computer readable medium as claimed in claim 3 where the predefined value is the highest importance value possible.
 5. A computer readable medium for storing a set of programming instructions for execution by, or on behalf of, a first node on a self-organizing network having a plurality of nodes and at least one link interconnecting each of said nodes, said instructions causing a computing apparatus to assign an importance value to updates that are to be sent over said network, wherein said instructions further cause the computing apparatus to communicate to other nodes that said first node wishes only to receive a predetermined number of updates with the highest importance values.
 6. The computer readable medium as claimed in claim 5 wherein said importance value is determined by how close said first node is to a specified data path or specified structure in the network.
 7. The computer readable medium as claimed in claim 5 wherein said instructions further cause the computing apparatus to assign a hop cost value to updates that are to be sent over said network.
 8. The computer readable medium as claimed in claim 7 wherein said hop cost value for a particular destination node is determined by an accumulation of service characteristics on the route from said node to said destination node.
 9. The computer readable medium as claimed in claim 7 wherein said instructions further cause the computing apparatus to communicate to other nodes that said first node wishes only to receive updates that have or exceed a predetermined importance value.
 10. A computer readable medium for storing a set of programming instructions for execution by, or on behalf of, a first node on a self-organizing network having a plurality of nodes and at least one link interconnecting each of said nodes, said instructions causing a computing apparatus to forward messages from a source node to a destination node via neighbors depending on the latency to the destination node via said neighbors, wherein messages for a destination node are not sent to a neighbor node when the neighbor node is in a specified state regarding messages for said destination node.
 11. The computer readable medium as claimed in claim 10 wherein the latency of the internal message queue of messages for a destination node is used to decide which neighbor messages for said destination node should be sent to.
 12. The computer readable medium as claimed in claim 11 wherein messages for a destination node are sent to a neighbor node if the latency to said destination node from said neighbor node is equal or less then the latency of the message queue for messages being sent to said destination node.
 13. The computer readable medium as claimed in claim 12 wherein messages for a destination node are not sent to a neighbor node when the neighbor node can not process an increased volume of messages for said destination node.
 14. A self-organizing network comprising: (a) a plurality of nodes; (b) at least one link interconnecting neighbouring ones of said nodes; (c) each of said nodes being operable to maintain information about each of said other nodes that is within a first portion of said nodes, said information including: (i) a first identity of another one of said nodes within said first portion; (ii) for each first identity, a second identity representing a neighbouring node that is a desired step to reach the said another one of said nodes respective to said first identity; (d) each of said nodes being operable to maintain a third identity representing a neighbouring node that is a desired step to send a request for information about said nodes in a second portion of said nodes that is not included in said first portion, wherein a network core is formed between neighbouring nodes that determine each other is a desired step to locate said nodes within said second portion.
 15. The network of claim 14 wherein said information includes, for each said first identity, a value representing a distance-to-data marked stream for said node associated with said first identity.
 16. The network of claim 15 wherein nodes associated with said first identity are ranked in an ascending order increasing according to said distance and said instructions are delivered to those nodes according to said rank.
 17. A computer readable medium for storing a set of programming instructions for execution by, or on behalf of, a node forming part of a self-organizing network having a plurality of other nodes and at least one link interconnecting neighbouring ones of said nodes; said programming instructions for causing a computing apparatus within said node to maintain information about each of said other nodes that are within a first portion of all of said other nodes, said information including: (a) a first identity of another one of said nodes within said first portion; (i) for each said first identity, a second identity representing a neighbouring node that is a desired step to reach the said another one of said nodes respective to said first identity;  said programming instructions for further causing said computing apparatus to maintain a third identity representing a neighbouring node that is a desired step to send a request for information about said nodes in a second portion of said nodes that are not included in said first portion and said programming instructions causing a computer apparatus to assign a value to said first node that can be taken into account during the selection of parent nodes in said network; and wherein said third identity is determined based on which of said neighbouring nodes most frequently appears in each said second identity.
 18. The computer readable medium of claim 17 wherein for each said first identity said value represents a distance-to-data marked stream for said node associated with said first identity.
 19. The computer readable medium of claim 18 wherein nodes associated with said first identity are ranked in an ascending order increasing according to said distance and said instructions are delivered to those nodes according to said rank.
 20. A computer readable medium for storing a set of programming instructions for execution by, or on behalf of, a first node on a self-organizing network having a plurality of nodes and at least one link interconnecting said nodes, said instructions causing a computing apparatus to select and remove information about one or more missing nodes in said network by delaying the sending of predetermined classes of updates to said network where a node update is delayed before being sent to a neighbor node if an update about said node has not been previously sent to said neighbor.
 21. A computer readable medium for storing a set of programming instructions for execution by, or on behalf of, a first node on a self-organizing network having a plurality of nodes and at least one link interconnecting said nodes, said instructions causing a computing apparatus to select and remove information about one or more missing nodes in said network by delaying the sending of predetermined classes of updates to said network where a node update is delayed before being sent to a neighbor node if the previous update about said node sent to said neighbor belongs to a predetermined class of updates.
 22. The computer readable medium as claimed in claim 21 where said predetermined class is a node update where said update indicates that no route is possible via said sending node. 